Weitai Kang

I am currently a third-year Ph.D. student with Prof. Yan Yan in Computer Science at University of Illinois Chicago (UIC). My research focuses on Multimodal Learning, particularly for Large Multimodal Models and Visual Grounding.

I will be a Research Intern at Adobe, working on Large Multimodal Models. Previously, I was a Research Intern at SonyAI, working toward Large Multimodal Models. Before that, I was a Visiting Scholar at University of Central Florida (UCF), working with Prof. Mubarak Shah toward 3D Visual Grounding and 3D Large Language Models topic. Before Ph.D., I earned my bachelor's degree in Mathematics from Sun Yat-sen University in 2022.

Email  /  CV  /  Google Scholar  /  Linkedin  /  Github  /  Twitter

profile photo
News
  • [01/2025] My first-author paper, Intent3D, is accepted to ICLR 2025!!!
  • [11/2024] Our paper, Infant Agent, for AI Agent is now available on arXiv.
  • [10/2024] My first-author paper, Robin3D, for 3D LLM is now available on arXiv.
  • [09/2024] Our paper, INTP-Video-LLM, for Video LLM is now available on arXiv.
  • [07/2024] My first-author paper, SegVG, is accepted to ECCV 2024!!! The code is now open-sourced.
  • [04/2024] Our paper, SaCo, for Transformer Explainability is accepted to CVPR 2024.
  • [03/2024] Our paper, TokenTM, for Transformer Explainability is accepted to CVPR 2024.
  • [02/2024] My first-author paper, Intent3D, for 3D Intention Grounding is now available on arXiv.
  • [10/2023] My first-author paper, ACTRESS, for Visual Grounding is now available on arXiv.
  • [08/2023] I am a Teaching Assistant of CS 577: Deep Learning at Illinois Institute of Technology.
  • [04/2023] My first-author paper, SegVG, for Visual Grounding is now available on arXiv.
  • [01/2023] My first-author paper, AttBalance, for Visual Grounding constraint is now available on arXiv.
  • [08/2022] I join Prof. Yan Yan's group as a Ph.D. student.
Publications
b3do Robin3D: Improving 3D Large Language Model via Robust Instruction Tuning

Weitai Kang, Haifeng Huang, Yuzhang Shang, Mubarak Shah, Yan Yan

PDF / Code

b3do Intent3D: 3D Object Detection in RGB-D Scans Based on Human Intention

Weitai Kang, Mengxue Qu, Jyoti Kini, Yunchao Wei, Mubarak Shah, Yan Yan

ICLR, 2025
Project Page / PDF / Code

b3do SegVG: Transferring Object Bounding Box to Segmentation for Visual Grounding

Weitai Kang, Gaowen Liu, Mubarak Shah, Yan Yan

ECCV, 2024
PDF / Code

b3do Interpolating Video-LLMs: Toward Longer-sequence LMMs in a Training-free Manner

Yuzhang Shang, Bingxin Xu, Weitai Kang, Mu Cai, Yuheng Li, Zehao Wen, Zhen Dong, Kurt Keutzer, Yong Jae Lee, Yan Yan

PDF

b3do Infant Agent: A Tool-Integrated, Logic-Driven Agent with Cost-Effective API Usage

Bin Lei, Yuchen Li, Yiming Zeng, Tao Ren, Yi Luo, Tianyu Shi, Zitian Gao, Zeyu Hu, Weitai Kang, Qiuwu Chen

PDF

b3do ACTRESS: Active Retraining for Semi-supervised Visual Grounding

Weitai Kang, Mengxue Qu, Yunchao Wei, Yan Yan

PDF

b3do AttBalance: Visual Grounding with Attention-Driven Constraint Balancing

Weitai Kang, Luowei Zhou, Junyi Wu, Changchang Sun, Yan Yan

PDF

b3do On the Faithfulness of Vision Transformer Explanations

Junyi Wu, Weitai Kang, Hao Tang, Yuan Hong, Yan Yan

CVPR, 2024
PDF

b3do Token Transformation Matters: Towards Faithful Post-hoc Explanation for Vision Transformer

Junyi Wu, Bin Duan, Weitai Kang, Hao Tang, Yan Yan

CVPR, 2024
PDF

Work Experiences
b3do Adobe · Research Internship

Research on Large Multimodal Model.

May 2025 - Aug. 2025, San Jose, California, United States · On-site

b3do SonyAI · Research Internship

Research on 2D Large Multimodal Model.

Oct. 2024 - Dec. 2024, Chicago, Illinois, United States · Remote

b3do Tencent · Machine Learning Engineer Internship

Work on Human Pose Detection.

Oct. 2021 - Jul. 2022, Shenzhen, Guangdong, China · On-site

b3do SenseTime · Research Internship

Research on Video Super-Resolution.

Jul. 2021 - Sep. 2021, Shenzhen, Guangdong, China · On-site


You can also reach me through WeChat: Victor_Hong_. Website template courtesy