Learning to assemble neural module tree networks for visual grounding D Liu, H Zhang, F Wu, ZJ Zha Proceedings of the IEEE International Conference on Computer Vision, 4673-4682, 2019 | 235 | 2019 |
Context-aware visual policy network for fine-grained image captioning ZJ Zha, D Liu, H Zhang, Y Zhang, F Wu IEEE transactions on pattern analysis and machine intelligence 44 (2), 710-722, 2019 | 148 | 2019 |
More grounded image captioning by distilling image-text matching model Y Zhou, M Wang, D Liu, Z Hu, H Zhang Proceedings of the IEEE/CVF conference on computer vision and pattern …, 2020 | 146 | 2020 |
Learning to compose and reason with language tree structures for visual grounding R Hong, D Liu, X Mo, X He, H Zhang IEEE transactions on pattern analysis and machine intelligence 44 (2), 684-696, 2019 | 123 | 2019 |
Context-aware visual policy network for sequence-level image captioning D Liu, ZJ Zha, H Zhang, Y Zhang, F Wu Proceedings of the 2018 ACM on Multimedia Conference, 1416--1424, 2018 | 117 | 2018 |
Learning to discretely compose reasoning module networks for video captioning G Tan, D Liu, M Wang, ZJ Zha Proceedings of the Twenty-Ninth International Joint Conference on Artificial …, 2020 | 76 | 2020 |
Semmae: Semantic-guided masking for learning masked autoencoders G Li, H Zheng, D Liu, C Wang, B Su, C Zheng Advances in Neural Information Processing Systems 35, 14290-14302, 2022 | 68 | 2022 |
Modeling image composition for complex scene generation Z Yang, D Liu, C Wang, J Yang, D Tao Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2022 | 37 | 2022 |
Transvg++: End-to-end visual grounding with language conditioned vision transformer J Deng, Z Yang, D Liu, T Chen, W Zhou, Y Zhang, H Li, W Ouyang IEEE transactions on pattern analysis and machine intelligence, 2023 | 22 | 2023 |
Compact bidirectional transformer for image captioning Y Zhou, Z Hu, D Liu, H Ben, M Wang arXiv preprint arXiv:2201.01984, 2022 | 16 | 2022 |
Joint Visual Grounding with Language Scene Graphs D Liu, H Zhang, ZJ Zha, M Wang, Q Sun arXiv preprint arXiv:1906.03561, 2019 | 12* | 2019 |
Cocktail: Mixing multi-modality control for text-conditional image generation M Hu, J Zheng, D Liu, C Zheng, C Wang, D Tao, TJ Cham Thirty-seventh Conference on Neural Information Processing Systems, 2023 | 8 | 2023 |
Modeling video as stochastic processes for fine-grained video representation learning H Zhang, D Liu, Q Zheng, B Su Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2023 | 7 | 2023 |
Semantically-consistent dynamic blurry image generation for image deblurring Z Jing, Y Zhang, C Wang, D Liu, Y Xia Proceedings of the 30th ACM International Conference on Multimedia, 2547-2555, 2022 | 3 | 2022 |
Eliminating Contextual Prior Bias for Semantic Image Editing via Dual-Cycle Diffusion Z Yang, T Chu, X Lin, E Gao, D Liu, J Yang, C Wang IEEE Transactions on Circuits and Systems for Video Technology, 2023 | 2 | 2023 |
ESceme: Vision-and-Language Navigation with Episodic Scene Memory Q Zheng, D Liu, C Wang, J Zhang, D Wang, D Tao arXiv preprint arXiv:2303.01032, 2023 | 2 | 2023 |
MMoT: Mixture-of-Modality-Tokens Transformer for Composed Multimodal Conditional Image Synthesis J Zheng, D Liu, C Wang, M Hu, Z Yang, C Ding, D Tao arXiv preprint arXiv:2305.05992, 2023 | 1 | 2023 |
Cross-Modal Contrastive Learning for Robust Reasoning in VQA Q Zheng, C Wang, D Liu, D Wang, D Tao arXiv preprint arXiv:2211.11190, 2022 | 1 | 2022 |
Language-conditioned region proposal and retrieval network for referring expression comprehension Y Xie, D Liu, X Chen, ZJ Zha Proceedings of the 2021 Workshop on Multi-Modal Pre-Training for Multimedia …, 2021 | 1 | 2021 |
OmniForce: On Human-Centered, Large Model Empowered and Cloud-Edge Collaborative AutoML System C Xue, W Liu, S Xie, Z Wang, J Li, X Peng, L Ding, S Zhao, Q Cao, Y Yang, ... arXiv preprint arXiv:2303.00501, 2023 | | 2023 |