Sipeng Zheng

Cited by

	All	Since 2019
Citations	122	122
h-index	6	6
i10-index	5	5

202020212022202320248 13 15 49 37

Public access

View all

4 articles

2 articles

available

not available

Based on funding mandates

Sipeng Zheng

Beijing Academy of Artificial Intelligence (BAAI)

Verified email at baai.ac.cn - Homepage

Computer Vision Large Multimodal Model Agent Learning


Title Sort by citations Sort by year Sort by title	Cited by Cited by	Year
Few-shot action recognition with hierarchical matching and contrastive learning S Zheng, S Chen, Q Jin European Conference on Computer Vision, 297-313, 2022	27	2022
Visual relation detection with multi-level attention S Zheng, S Chen, Q Jin Proceedings of the 27th ACM international conference on multimedia, 121-129, 2019	23	2019
Skeleton-based interactive graph network for human object interaction detection S Zheng, S Chen, Q Jin 2020 IEEE International Conference on Multimedia and Expo (ICME), 1-6, 2020	16	2020
Relation understanding in videos S Zheng, X Chen, S Chen, Q Jin Proceedings of the 27th ACM International Conference on Multimedia, 2662-2666, 2019	15	2019
Vrdformer: End-to-end video visual relation detection with transformers S Zheng, S Chen, Q Jin Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2022	12	2022
Open-category human-object interaction pre-training via language modeling framework S Zheng, B Xu, Q Jin Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2023	7	2023
Llama rider: Spurring large language models to explore the open world Y Feng, Y Wang, J Liu, S Zheng, Z Lu arXiv preprint arXiv:2310.08922, 2023	5	2023
Towards general computer control: A multimodal agent for red dead redemption ii as a case study W Tan, Z Ding, W Zhang, B Li, B Zhou, J Yue, H Xia, J Jiang, L Zheng, ... arXiv preprint arXiv:2403.03186, 2024	4	2024
Steve-Eye: Equipping LLM-based Embodied Agents with Visual Perception in Open Worlds S Zheng, Y Feng, Z Lu The Twelfth International Conference on Learning Representations, 2023	4	2023
Accommodating audio modality in CLIP for multimodal processing L Ruan, A Hu, Y Song, L Zhang, S Zheng, Q Jin Proceedings of the AAAI Conference on Artificial Intelligence 37 (8), 9641-9649, 2023	4	2023
Exploring anchor-based detection for ego4d natural language query S Zheng, Q Zhang, B Liu, Q Jin, J Fu arXiv preprint arXiv:2208.05375, 2022	4	2022
Anchor-based detection for natural language localization in ego-centric videos B Liu, S Zheng, J Fu, WH Cheng 2023 IEEE International Conference on Consumer Electronics (ICCE), 01-04, 2023	1	2023
UniCode: Learning a Unified Codebook for Multimodal Large Language Models S Zheng, B Zhou, Y Feng, Y Wang, Z Lu arXiv preprint arXiv:2403.09072, 2024		2024
SPAFormer: Sequential 3D Part Assembly with Transformers B Xu, S Zheng, Q Jin arXiv preprint arXiv:2403.05874, 2024		2024
POV: Prompt-Oriented View-Agnostic Learning for Egocentric Hand-Object Interaction in the Multi-view World B Xu, S Zheng, Q Jin Proceedings of the 31st ACM International Conference on Multimedia, 2807-2816, 2023		2023
No-frills Temporal Video Grounding: Multi-Scale Neighboring Attention and Zoom-in Boundary Detection Q Zhang, S Zheng, Q Jin arXiv preprint arXiv:2307.10567, 2023		2023
Supplementary Material for Open-Category Human-Object Interaction Pre-training via Language Modeling Framework S Zheng, B Xu, Q Jin relation 50 (100), 100, 0
Supplementary Material for VRDFormer: End-to-End Video Visual Relation Detection with Transformers S Zheng, S Chen, Q Jin

The system can't perform the operation now. Try again later.

Articles 1–18

Citations per year

Duplicate citations

Merged citations

Add co-authorsCo-authors

Follow

Cited by