Weitai Kang
I am currently a third-year Ph.D. student with Prof. Yan Yan in Computer Science at University of Illinois Chicago (UIC).
My research focuses on Multimodal Learning, particularly for Large Multimodal Models and Visual Grounding.
I will be a Research Intern at Adobe, working on Large Multimodal Models.
Previously, I was a Research Intern at SonyAI, working toward Large Multimodal Models.
Before that, I was a Visiting Scholar at University of Central Florida (UCF), working with Prof. Mubarak Shah
toward 3D Visual Grounding and 3D Large Language Models topic.
Before Ph.D., I earned my bachelor's degree in Mathematics from Sun Yat-sen University in 2022.
Email  / 
CV  / 
Google Scholar  / 
Linkedin  / 
Github  / 
Twitter
|
|
News
- [01/2025] My first-author paper, Intent3D, is accepted to ICLR 2025!!!
- [11/2024] Our paper, Infant Agent, for AI Agent is now available on arXiv.
- [10/2024] My first-author paper, Robin3D, for 3D LLM is now available on arXiv.
- [09/2024] Our paper, INTP-Video-LLM, for Video LLM is now available on arXiv.
- [07/2024] My first-author paper, SegVG, is accepted to ECCV 2024!!! The code is now open-sourced.
- [04/2024] Our paper, SaCo, for Transformer Explainability is accepted to CVPR 2024.
- [03/2024] Our paper, TokenTM, for Transformer Explainability is accepted to CVPR 2024.
- [02/2024] My first-author paper, Intent3D, for 3D Intention Grounding is now available on arXiv.
- [10/2023] My first-author paper, ACTRESS, for Visual Grounding is now available on arXiv.
- [08/2023] I am a Teaching Assistant of CS 577: Deep Learning at Illinois Institute of Technology.
- [04/2023] My first-author paper, SegVG, for Visual Grounding is now available on arXiv.
- [01/2023] My first-author paper, AttBalance, for Visual Grounding constraint is now available on arXiv.
- [08/2022] I join Prof. Yan Yan's group as a Ph.D. student.
|
|
Robin3D: Improving 3D Large Language Model via Robust Instruction Tuning
Weitai Kang, Haifeng Huang, Yuzhang Shang, Mubarak Shah, Yan Yan
PDF /
Code
|
|
Intent3D: 3D Object Detection in RGB-D Scans Based on Human Intention
Weitai Kang, Mengxue Qu, Jyoti Kini, Yunchao Wei, Mubarak Shah, Yan Yan
ICLR, 2025
Project Page /
PDF /
Code
|
|
SegVG: Transferring Object Bounding Box to Segmentation for Visual Grounding
Weitai Kang, Gaowen Liu, Mubarak Shah, Yan Yan
ECCV, 2024
PDF /
Code
|
|
Interpolating Video-LLMs: Toward Longer-sequence LMMs in a Training-free Manner
Yuzhang Shang, Bingxin Xu, Weitai Kang, Mu Cai, Yuheng Li, Zehao Wen, Zhen Dong, Kurt Keutzer, Yong Jae Lee, Yan Yan
PDF
|
|
Infant Agent: A Tool-Integrated, Logic-Driven Agent with Cost-Effective API Usage
Bin Lei, Yuchen Li, Yiming Zeng, Tao Ren, Yi Luo, Tianyu Shi, Zitian Gao, Zeyu Hu, Weitai Kang, Qiuwu Chen
PDF
|
|
ACTRESS: Active Retraining for Semi-supervised Visual Grounding
Weitai Kang, Mengxue Qu, Yunchao Wei, Yan Yan
PDF
|
|
AttBalance: Visual Grounding with Attention-Driven Constraint Balancing
Weitai Kang, Luowei Zhou, Junyi Wu, Changchang Sun, Yan Yan
PDF
|
|
On the Faithfulness of Vision Transformer Explanations
Junyi Wu, Weitai Kang, Hao Tang, Yuan Hong, Yan Yan
CVPR, 2024
PDF
|
|
Token Transformation Matters: Towards Faithful Post-hoc Explanation for Vision Transformer
Junyi Wu, Bin Duan, Weitai Kang, Hao Tang, Yan Yan
CVPR, 2024
PDF
|
|
Adobe · Research Internship
Research on Large Multimodal Model.
May 2025 - Aug. 2025, San Jose, California, United States · On-site
|
|
SonyAI · Research Internship
Research on 2D Large Multimodal Model.
Oct. 2024 - Dec. 2024, Chicago, Illinois, United States · Remote
|
|
Tencent · Machine Learning Engineer Internship
Work on Human Pose Detection.
Oct. 2021 - Jul. 2022, Shenzhen, Guangdong, China · On-site
|
|
SenseTime · Research Internship
Research on Video Super-Resolution.
Jul. 2021 - Sep. 2021, Shenzhen, Guangdong, China · On-site
|
|