Lei Zhang

Chief Scientist
Computer Vision & Robotics
International Digital Economy Academy (IDEA)
Guangdong-HongKong-Macau Greater Bay Area
Shenzhen, China

Email: leizhang AT idea dot edu dot cn

Google Scholar | LinkedIn

I am the Chief Scientist of Computer Vision and Robotics at International Digital Economy Academy (IDEA) and an Adjunct Professor of Hong Kong University of Science and Technology (Guangzhou). Prior to this, I was a Principal Researcher and Research Manager at Microsoft, where I have worked since 2001 in Microsoft Research Asia (MSRA) for 12 years and later joined Bing Multimedia, Microsoft Research (MSR, Redmond), and Microsoft Cloud & AI from 2013 to 2021. My research interests are in computer vision and machine learning. I am particularly intersted in generic visual recognition at large scale and was named as IEEE Fellow for my contributions in this area.

I have served as editorial board members for IEEE T-MM, T-CSVT, and Multimedia System Journal, as program co-chairs, area chairs, or committee members for many top conferences. I have published 150+ papers and hold 60+ US patents.

I received all my degrees (B.E., M.E., and Ph.D) in Computer Science from Tsinghua University.

Openings

Our team has made significant contributions to object detection, including DINO, detrex, Grounding DINO, Grounded SAM, T-Rex2, and Grounding DINO 1.5.

We are hiring both researchers and engineers working on computer vision and machine learning. Note that we don’t have a clear boundary between these two roles. Most of our high impact projects are the result of team work with both researchers and engineers. Therefore, we expect our researchers to have solid engineering skills and our engineers to have good knowledge on research algorithms.
We are also looking for research and engineering interns. In particular, we currently look for interns who have good experience on model architecture design, optimization, and compression, and are interested in deploying vision models to resource-constrained hardware devices.
I normally recruit one PhD student each year at SCUT and start to recruit 1-2 PhD students from 2024 at HKUST-GZ in the general area of computer vision, including object detection, segmentation, open-vocabulary recognition, multi-modality understanding, contention generation, etc.

If you are interested, please send me your resume with subject “Job application in IDEA” or “Intern application in IDEA” or “PhD application”. I won’t be able to reply each individual email. But if I find your experience a good fit to our team, I will get back.

For all positions, we value your core skills. If you’ve done any interesting research or made extensive contributions to any open-source project, highlight that in your resume and inquiry email!

Project Highlight

Grounding DINO 1.5: Advance the “Edge” of Open-Set Object Detection

Grounding DINO 1.5 Pro

Grounding DINO 1.5 Edge

Grounding DINO 1.5 improves Grounding DINO with two distinct models. The Pro model aims to push the boundary of open-set object detection, setting new records on zero-shot detection benchmarks: 54.3 AP on COCO, 55.7 AP on LVIS, and 30.2 AP on ODinW. The Edge model is optimized for speed, achieving 75.2 FPS on NVIDIA A100 and over 10 FPS on NVIDIA Orin NX while maintaining an excellent balance between efficiency and accuracy.

[ Paper | Blog | Playground ]

T-Rex2: Towards Generic Object Detection via Text-Visual Prompt Synergy

T-Rex2 uniquely combines text and visual prompts for open-set object detection. In particular, its visual prompt capability can unlock a wide range of scenarios without the need for task-specific tuning or extensive training datasets.

[ Paper | Blog | Playground ]

Recent Publications

Open-World Human-Object Interaction Detection via Multi-modal Prompts

Jie Yang, Bingliang Li, Ailing Zeng, Lei Zhang and Ruimao Zhang

Computer Vision and Pattern Recognition (CVPR), 2024

BibTex

arXiv (pdf)

@inproceedings{yang2024openworld,
  title = {Open-World Human-Object Interaction Detection via Multi-modal Prompts},
  author = {Yang, Jie and Li, Bingliang and Zeng, Ailing and Zhang, Lei and Zhang, Ruimao},
  booktitle = {Computer Vision and Pattern Recognition (CVPR)},
  year = {2024}
}

Visual In-Context Prompting

Feng Li, Qing Jiang, Hao Zhang, Shilong Liu, Huaizhe Xu, Xueyan Zou, Tianhe Ren, Hongyang Li, Lei Zhang, Chunyuan Li, Jianwei Yang and Jianfeng Gao

Computer Vision and Pattern Recognition (CVPR), 2024

BibTex

arXiv (pdf)

Code

@inproceedings{li2024visual,
  title = {Visual In-Context Prompting},
  author = {Li, Feng and Jiang, Qing and Zhang, Hao and Liu, Shilong and Xu, Huaizhe and Zou, Xueyan and Ren, Tianhe and Li, Hongyang and Zhang, Lei and Li, Chunyuan and Yang, Jianwei and Gao, Jianfeng},
  booktitle = {Computer Vision and Pattern Recognition (CVPR)},
  year = {2024}
}

Tag2Text: Guiding Vision-Language Model via Image Tagging

Xinyu Huang, Youcai Zhang, Jinyu Ma, Weiwei Tian, Rui Feng, Yuejie Zhang, Yaqian Li, Yandong Guo and Lei Zhang

International Conference on Learning Representations (ICLR), 2024

BibTex

arXiv (pdf)

Code

@inproceedings{huang2024tag,
  title = {Tag2Text: Guiding Vision-Language Model via Image Tagging},
  author = {Huang, Xinyu and Zhang, Youcai and Ma, Jinyu and Tian, Weiwei and Feng, Rui and Zhang, Yuejie and Li, Yaqian and Guo, Yandong and Zhang, Lei},
  booktitle = {International Conference on Learning Representations (ICLR)},
  year = {2024}
}

Symbol as Points: Panoptic Symbol Spotting via Point-based Representation

Wenlong Liu, Tianyu Yang, Yuhan Wang, Qizhi Yu and Lei Zhang

International Conference on Learning Representations (ICLR), 2024

BibTex

arXiv (pdf)

Code

@inproceedings{liu2024symbol,
  title = {Symbol as Points: Panoptic Symbol Spotting via Point-based Representation},
  author = {Liu, Wenlong and Yang, Tianyu and Wang, Yuhan and Yu, Qizhi and Zhang, Lei},
  booktitle = {International Conference on Learning Representations (ICLR)},
  year = {2024}
}

Progressive3D: Progressively Local Editing for Text-to-3D Content Creation with Complex Semantic Prompts

Xinhua Cheng, Tianyu Yang, Jianan Wang, Yu Li, Lei Zhang, Jian Zhang and Li Yuan

International Conference on Learning Representations (ICLR), 2024

BibTex

arXiv (pdf)

Code

@inproceedings{cheng2024progressive,
  title = {Progressive3D: Progressively Local Editing for Text-to-3D Content Creation with Complex Semantic Prompts},
  author = {Cheng, Xinhua and Yang, Tianyu and Wang, Jianan and Li, Yu and Zhang, Lei and Zhang, Jian and Yuan, Li},
  booktitle = {International Conference on Learning Representations (ICLR)},
  year = {2024}
}

DreamTime: An Improved Optimization Strategy for Diffusion-Guided 3D Generation

Yukun Huang, Jianan Wang, Yukai Shi, Boshi Tang, Xianbiao Qi and Lei Zhang

International Conference on Learning Representations (ICLR), 2024

BibTex

arXiv (pdf)

@inproceedings{huang2024dreamtime,
  title = {DreamTime: An Improved Optimization Strategy for Diffusion-Guided 3D Generation},
  author = {Huang, Yukun and Wang, Jianan and Shi, Yukai and Tang, Boshi and Qi, Xianbiao and Zhang, Lei},
  booktitle = {International Conference on Learning Representations (ICLR)},
  year = {2024}
}

TOSS: High-quality Text-guided Novel View Synthesis from a Single Image

Yukai Shi, Jianan Wang, He CAO, Boshi Tang, Xianbiao Qi, Tianyu Yang, Yukun Huang, Shilong Liu, Lei Zhang and Heung-Yeung Shum

International Conference on Learning Representations (ICLR), 2024

BibTex

arXiv (pdf)

Code

@inproceedings{shi2024toss,
  title = {TOSS: High-quality Text-guided Novel View Synthesis from a Single Image},
  author = {Shi, Yukai and Wang, Jianan and CAO, He and Tang, Boshi and Qi, Xianbiao and Yang, Tianyu and Huang, Yukun and Liu, Shilong and Zhang, Lei and Shum, Heung-Yeung},
  booktitle = {International Conference on Learning Representations (ICLR)},
  year = {2024}
}

DreamWaltz: Make a Scene with Complex 3D Animatable Avatars

Yukun Huang, Jianan Wang, Ailing Zeng, He Cao, Xianbiao Qi, Yukai Shi, Zheng-Jun Zha and Lei Zhang

Conference on Neural Information Processing Systems (NeurIPS), 2023

BibTex

arXiv (pdf)

Code

@inproceedings{huang2023dreamwaltz,
  title = {DreamWaltz: Make a Scene with Complex 3D Animatable Avatars},
  author = {Huang, Yukun and Wang, Jianan and Zeng, Ailing and Cao, He and Qi, Xianbiao and Shi, Yukai and Zha, Zheng-Jun and Zhang, Lei},
  booktitle = {Conference on Neural Information Processing Systems (NeurIPS)},
  year = {2023}
}

SMPLer-X: Scaling Up Expressive Human Pose and Shape Estimation

Zhongang Cai, Wanqi Yin, Ailing Zeng, Chen Wei, Qingping Sun, Yanjun Wang, Hui En Pang, Haiyi Mei, Mingyuan Zhang, Lei Zhang, Chen Change Loy, Lei Yang and Ziwei Liu

Conference on Neural Information Processing Systems (NeurIPS), 2023

BibTex

arXiv (pdf)

Code

@inproceedings{cai2023smplerx,
  title = {SMPLer-X: Scaling Up Expressive Human Pose and Shape Estimation},
  author = {Cai, Zhongang and Yin, Wanqi and Zeng, Ailing and Wei, Chen and Sun, Qingping and Wang, Yanjun and Pang, Hui En and Mei, Haiyi and Zhang, Mingyuan and Zhang, Lei and Loy, Chen Change and Yang, Lei and Liu, Ziwei},
  booktitle = {Conference on Neural Information Processing Systems (NeurIPS)},
  year = {2023}
}

A Comprehensive Benchmark for Neural Human Body Rendering

Kenkun Liu, Derong Jin, Ailing Zeng, Xiaoguang Han and Lei Zhang

Conference on Neural Information Processing Systems (NeurIPS), 2023

BibTex

pdf

@inproceedings{liu2023comprehensive,
  title = {A Comprehensive Benchmark for Neural Human Body Rendering},
  author = {Liu, Kenkun and Jin, Derong and Zeng, Ailing and Han, Xiaoguang and Zhang, Lei},
  booktitle = {Conference on Neural Information Processing Systems (NeurIPS)},
  year = {2023}
}