Jaewoo (Jeffrey) Heo

Hey! I'm Jaewoo (Jeffrey), a coterminal master's (MS) student in Computer Science at Stanford University. I'm a research assistant at the Stanford MARVL (Medical AI and ComputeR Vision Lab) where I am advised by Professor Serena Yeung-Levy. I am also a part of Stanford Vision and Learning Lab, where I work with Professor Fei-Fei Li and Professor Ehsan Adeli in the Partner in AI-Assisted Care (PAC) group.

I received my B.S. in Computer Science with Honors from Stanford University in 2024. During my studies at Stanford, I've been primarily involved with computer vision and medical AI research. My recent works in query-agnostic deformable cross attention, diffusion models as prior, and VLM for reliable data generation aim to enhance our computational understanding of human behavior and interaction in 3D.

In my free time, I enjoy playing the guitar, writing music, playing tennis, and watching soccer. I am a huge fan of The Beatles, Pink Floyd, and Billy Joel. I used to never be able to listen to music while coding, but I've recently developed a knack for putting on some bossa nova while working.

Email / CV / Scholar / Github

Research

"Understanding the problem is half the solution"

My research interests lie in computer vision and deep learning for video understanding, generative modeling, VLM-guided synthetic data generation, as well as 4D human motion recovery. I'm also passionate about applied ML research particularly in the fields of health and medicine. I am driven by the challenge of creating AI tools that can understand the visual world as well as the humans that inhabit this world in 4D (3D spatial + 1D temporal) to make meaningful inferences about the human condition. My work aims to bridge the gap between technical innovation and real-world application.

	Systematic Evaluation of Large Vision-Language Models for Surgical Artificial Intelligence Anita Rau, Mark Endo, Josiah Aklilu, Jaewoo Heo, Khaled Saab, Alberto Paderno, Jeffrey Jopling, F. Christopher Holsinger, Serena Yeung-Levy arXiv, 2025 project page / arXiv / code (coming soon) Evaluated 11 state-of-the-art VLMs across 17 surgical AI tasks using 13 datasets, demonstrating VLMs' superior generalizability compared to supervised models when deployed outside their training environments
	DeforHMR: Vision Transformer with Deformable Cross-Attention for 3D Human Mesh Recovery Jaewoo Heo, George Hu, Zeyu Wang, Serena Yeung-Levy 3DV, 2025 (accepted) project page / 3DV'25 Camera-ready / code (coming soon) Proposes a novel query-agnostic deformable cross-attention mechanism that allows the model to attend to relevant spatial features more flexibly and in a data-dependent manner. Achieves SOTA performance on 3D human mesh recovery benchmarks 3DPW and RICH.
	Motion Diffusion-Guided 3D Global HMR from a Dynamic Camera Jaewoo Heo, Kuan-Chieh Wang, Karen Liu, Serena Yeung-Levy arXiv, 2024 project page / arXiv / code (coming soon) A 3D global HMR model that leverages the motion diffusion model (MDM) as a prior of coherent human motion. The model is robust to dynamic camera motion and long videos.
	Ask, Pose, Unite: Scaling Data Acquisition for Close Interactions with Vision Language Models Laura Bravo-Sánchez, Jaewoo Heo, Zhenzhen Weng, Kuan-Chieh Wang, Serena Yeung-Levy CVPR, 2025 (submitted) project page / arXiv / code (coming soon) Proposes a novel data generation method for close interactions that leverages noisy automatic annotations to scale data acquisition, producing pseudo-ground truth meshes from in-the-wild images.
	NeuHMR: Neural Rendering-Guided Human Motion Reconstruction Tiange Xiang, Kuan-Chieh Wang, Jaewoo Heo, Ehsan Adeli, Serena Yeung-Levy, Scott Delp, Li Fei-Fei 3DV, 2025 (accepted) project page (coming soon) / arXiv (coming soon) / code (coming soon) Rethinks the dependency on the 2D key point fitting paradigm and presents NeuHMR, an optimization-based mesh recovery framework based on recent advances in neural rendering (NeRF).

I referred to this website's source code to build my own.