Weitai Kang (康伟泰)

I am a 4th-year Ph.D. student working with Prof. Yan Yan in Computer Science at the University of Illinois Chicago. I will graduate in May 2027.

I work on Large Multimodal Models across text, image, GUI, 3D, and video. I specialize in Visual Grounding, Generalist Models, AI Agents, Reinforcement Learning, and Image Editing.

I have published 15 papers, with 7 first-author papers at top-tier venues (CVPR, NeurIPS, ICCV, ICLR, ECCV, and ACM MM). I have interned at Netflix, Adobe, SonyAI, Tencent and SenseTime. I have been a visiting scholar at the University of Central Florida, working with Prof. Mubarak Shah. Before starting my PhD, I received my bachelor's degree in Mathematics from Sun Yat-sen University in 2022, where I was awarded the Outstanding Student Scholarship each year.

Email  /  CV  /  Google Scholar  /  Linkedin  /  Github  /  Twitter  /  Hi~


I am actively seeking full-time Research Scientist positions starting in 2027. Feel free to reach out if there is a good fit.

profile photo

Work Experiences
b3do Netflix · Research Internship · May 2026 - Aug. 2026

Research on Video Understanding

Los Gatos, California

b3do Adobe · Research Internship · May 2025 - May 2026

Research on Visual Grounding and Image Editing

San Jose, California and Chicago, Illinois

b3do SonyAI · Research Internship · Oct. 2024 - Dec. 2024

Research on 2D Large Multimodal Model

Chicago, Illinois

b3do Tencent · Machine Learning Engineer Internship · Oct. 2021 - Jul. 2022

Work on Human Pose Detection

Shenzhen, Guangdong

b3do SenseTime · Research Internship · Jul. 2021 - Sep. 2021

Research on Video Super-Resolution

Shenzhen, Guangdong


Publications
b3do Inline Critic Steers Image Editing
Weitai Kang, Xiaohang Zhan, Yizhou Wang, Mang Tik Chiu, Jason Kuen, Kangning Liu, Yan Yan
PDF
b3do VGent: Visual Grounding via Modular Design for Disentangling Reasoning and Prediction
Weitai Kang, Jason Kuen, Mengwei Ren, Zijun Wei, Yan Yan, Kangning Liu
CVPR 2026. PDF
b3do GuirlVG: Incentivize GUI Visual Grounding via Empirical Exploration on Reinforcement Learning
Weitai Kang, Bin Lei, Gaowen Liu, Caiwen Ding, Yan Yan
ICLR 2026. PDF
b3do Robin3D: Improving 3D Large Language Model via Robust Instruction Tuning
Weitai Kang, Haifeng Huang, Yuzhang Shang, Mubarak Shah, Yan Yan
ICCV 2025. PDF / Code
b3do InfantAgent-Next: A Multimodal Generalist Agent for Automated Computer Interaction
Bin Lei*, Weitai Kang*, Zijian Zhang, Winson Chen, Xi Xie, Shan Zuo, Mimi Xie, Ali Payani, Mingyi Hong, Yan Yan, Caiwen Ding
NeurIPS 2025. * Equal contribution. PDF / Code
b3do Intent3D: 3D Object Detection in RGB-D Scans Based on Human Intention
Weitai Kang, Mengxue Qu, Jyoti Kini, Yunchao Wei, Mubarak Shah, Yan Yan
ICLR 2025. Project Page / PDF / Code
b3do AttBalance: Visual Grounding with Attention-Driven Constraint Balancing
Weitai Kang, Luowei Zhou, Junyi Wu, Changchang Sun, Yan Yan
ACM MM 2025. PDF
b3do SegVG: Transferring Object Bounding Box to Segmentation for Visual Grounding
Weitai Kang, Gaowen Liu, Mubarak Shah, Yan Yan
ECCV 2024. PDF / Code
b3do ExpVG: Investigating the Design Space of Visual Grounding in Multimodal Large Language Model
Weitai Kang, Weiming Zhuang, Zhizhong Li, Yan Yan, Lingjuan Lyu
PDF
b3do Interpolating Video-LLMs: Toward Longer-sequence LMMs in a Training-free Manner
Yuzhang Shang, Bingxin Xu, Weitai Kang, Mu Cai, Yuheng Li, Zehao Wen, Zhen Dong, Kurt Keutzer, Yong Jae Lee, Yan Yan
PDF
b3do ACTRESS: Active Retraining for Semi-supervised Visual Grounding
Weitai Kang, Mengxue Qu, Yunchao Wei, Yan Yan
PDF
b3do 3DResT: A Strong Baseline for Semi-Supervised 3D Referring Expression Segmentation
Wenxin Chen, Mengxue Qu, Weitai Kang, Yan Yan, Yao Zhao, Yunchao Wei
TMM 2025. PDF
b3do Infant Agent: A Tool-Integrated, Logic-Driven Agent with Cost-Effective API Usage
Bin Lei, Yuchen Li, Yiming Zeng, Tao Ren, Yi Luo, Tianyu Shi, Zitian Gao, Zeyu Hu, Weitai Kang, Qiuwu Chen
PDF
b3do On the Faithfulness of Vision Transformer Explanations
Junyi Wu, Weitai Kang, Hao Tang, Yuan Hong, Yan Yan
CVPR 2024. PDF
b3do Token Transformation Matters: Towards Faithful Post-hoc Explanation for Vision Transformer
Junyi Wu, Bin Duan, Weitai Kang, Hao Tang, Yan Yan
CVPR 2024. PDF

News

You can also reach me through WeChat: Victor_Hong_