Weitai Kang
I am currently a third-year Ph.D. student with Prof. Yan Yan in Computer Science at University of Illinois at Chicago (UIC).
My research focuses on Multimodal Learning, particularly for Foundation Models, Multimodal Large Language Models and 2D & 3D Visual Grounding.
In the meantime, I am a Research Intern at SonyAI, working toward Multimodal Foundation Model and Generative AI.
Before that, I was a Visiting Scholar at University of Central Florida (UCF), working with Prof. Mubarak Shah
toward 3D Visual Grounding and 3D Large Language Models topic.
Before Ph.D., I earned my bachelor's degree in Mathematics from Sun Yat-sen University in 2022.
I am actively seeking 2025 Summer Internship opportunities.
Email  / 
CV  / 
Google Scholar  / 
Linkedin  / 
Github  / 
Twitter
|
|
News
[11/2024] Our paper, Infant Agent, for AI Agent is now available on arXiv.
[10/2024] My first-author paper, Robin3D, for 3D LLM is now available on arXiv.
[09/2024] Our paper, INTP-Video-LLM, for Video LLM is now available on arXiv.
[07/2024] My first-author paper, SegVG, is accepted to ECCV 2024. The code is now open-sourced.
[04/2024] Our paper, SaCo, for Transformer Explainability is accepted to CVPR 2024.
[03/2024] Our paper, TokenTM, for Transformer Explainability is accepted to CVPR 2024.
[02/2024] My first-author paper, Intent3D, for 3D Intention Grounding is now available on arXiv.
[10/2023] My first-author paper, ACTRESS, for Visual Grounding is now available on arXiv.
[08/2023] I am a Teaching Assistant of CS 577: Deep Learning at Illinois Institute of Technology.
[04/2023] My first-author paper, SegVG, for Visual Grounding is now available on arXiv.
[01/2023] My first-author paper, AttBalance, for Visual Grounding constraint is now available on arXiv.
[08/2022] I join Prof. Yan Yan's group as a Ph.D. student.
|
|
Robin3D: Improving 3D Large Language Model via Robust Instruction Tuning
Weitai Kang, Haifeng Huang, Yuzhang Shang, Mubarak Shah, Yan Yan
PDF /
Code
|
|
Interpolating Video-LLMs: Toward Longer-sequence LMMs in a Training-free Manner
Yuzhang Shang, Bingxin Xu, Weitai Kang, Mu Cai, Yuheng Li, Zehao Wen, Zhen Dong, Kurt Keutzer, Yong Jae Lee, Yan Yan
PDF
|
|
SegVG: Transferring Object Bounding Box to Segmentation for Visual Grounding
Weitai Kang, Gaowen Liu, Mubarak Shah, Yan Yan
ECCV, 2024
PDF /
Code
|
|
Intent3D: 3D Object Detection in RGB-D Scans Based on Human Intention
Weitai Kang, Mengxue Qu, Jyoti Kini, Yunchao Wei, Mubarak Shah, Yan Yan
PDF
|
|
Infant Agent: A Tool-Integrated, Logic-Driven Agent with Cost-Effective API Usage
Bin Lei, Yuchen Li, Yiming Zeng, Tao Ren, Yi Luo, Tianyu Shi, Zitian Gao, Zeyu Hu, Weitai Kang, Qiuwu Chen
PDF
|
|
ACTRESS: Active Retraining for Semi-supervised Visual Grounding
Weitai Kang, Mengxue Qu, Yunchao Wei, Yan Yan
PDF
|
|
AttBalance: Visual Grounding with Attention-Driven Constraint Balancing
Weitai Kang, Luowei Zhou, Junyi Wu, Changchang Sun, Yan Yan
PDF
|
|
On the Faithfulness of Vision Transformer Explanations
Junyi Wu, Weitai Kang, Hao Tang, Yuan Hong, Yan Yan
CVPR, 2024
PDF
|
|
Token Transformation Matters: Towards Faithful Post-hoc Explanation for Vision Transformer
Junyi Wu, Bin Duan, Weitai Kang, Hao Tang, Yan Yan
CVPR, 2024
PDF
|
|