Khiem Vuong

I'm a PhD student at Carnegie Mellon University (Robotics Institute), where I'm fortunate to be advised by Deva Ramanan and Srinivasa Narasimhan. During my PhD, I also interned at Apple (VIO/SLAM, VCV) working on 3D foundation models for AR/VR applications.

Previously, I received my BS in Computer Science from University of Minnesota, working with Stergios Roumeliotis and Hyun Soo Park on VI-SLAM and egocentric scene understanding.

I'm interested in computer vision, robotics, and machine learning, especially 3D/4D reconstruction and visual scene understanding. Currently, I'm exploring the intersection of video generative models and 3D/4D reconstruction.

Feel free to reach out to me at kvuong at andrew.cmu.edu!

Email  |  Resume  |  GitHub  |  Google Scholar

headshot

News
01/2026   I will join Meta Reality Labs as an intern this summer (XR World AI team, working with Peter Kontschieder)!
03/2025   AerialMegaDepth is accepted to CVPR 2025!
02/2025   I will join Apple as an intern this summer (VIO/SLAM team, Video Computer Vision)!
03/2024   WALT3D is accepted to CVPR 2024 as an Oral (top 0.8%)!
10/2023   Toward Planet-Wide Traffic Camera Calibration is accepted to WACV 2024.
---- show more ----

Publications
AerialMegaDepth: Learning Aerial-Ground Reconstruction and View Synthesis
Khiem Vuong, Anurag Ghosh, Deva Ramanan*, Srinivasa Narasimhan*, Shubham Tulsiani*
CVPR 2025   (* equal contribution/advising)
paper · project page · 3D web viewer · code
A scalable data generation framework that combines mesh-renderings with real images, enabling robust 3D reconstruction across extreme viewpoint variations (e.g., aerial-ground).
WALT3D: Generating Realistic Training Data from Time-Lapse Imagery for Reconstructing Dynamic Objects under Occlusion
Khiem Vuong*, N Dinesh Reddy*, Robert Tamburo, Srinivasa G. Narasimhan
CVPR 2024 (Oral, top 0.8%)   (* equal contribution)
paper · project page · video
Automatically generate realistic training data from time-lapse videos for reconstructing dynamics objects (people, vehicle) under occlusion.
Toward Planet-Wide Traffic Camera Calibration
Khiem Vuong, Robert Tamburo, Srinivasa G. Narasimhan
WACV 2024
paper · project page · video · code
3D scene reconstruction and precise localization of over 100 real-world traffic cameras.
Egocentric Scene Understanding via Multimodal Spatial Rectifier
Tien Do, Khiem Vuong, Hyun Soo Park
CVPR 2022 (Oral, top 4.2%)
paper · project page · video · code
Extension of the spatial rectifier to the multi-directional case, applying to depth/surface normal prediction from egocentric view, accompanied by a novel egocentric RGB-D dataset.
Surface Normal Estimation of Tilted Images via Spatial Rectifier
Tien Do, Khiem Vuong, Stergios I. Roumeliotis, Hyun Soo Park
ECCV 2020 (Spotlight, top 3%)
paper · project page · video · code
Robust surface normal estimation by spatially rectifying image to the densely distributed orientations.
Deep Multi-view Depth Estimation with Predicted Uncertainty
Tong Ke, Tien Do, Khiem Vuong, Kourosh Sartipi, Stergios I. Roumeliotis
ICRA 2021
paper · code
Depth estimation by multiview triangulation, followed by an iterative depth refinement module that preserves estimates with high triangulation confidence.
Deep Depth Estimation from Visual-Inertial SLAM
Kourosh Sartipi, Tien Do, Tong Ke, Khiem Vuong, Stergios I. Roumeliotis
IROS 2020
paper · code
Depth estimation by completing a sparse VI-SLAM point cloud using planar constraint from robust surface normal prediction.
Academic Services

Teaching Assistant: Computer Vision (16-720, Fall 2024) and Computer Vision (16-385, Spring 2025) at CMU

Conference Reviewer: NeurIPS (2022), CVPR (2023, 2024, 2025), ICCV (2023, 2025), ECCV (2024), WACV (2024, 2025), IROS (2024), AAAI (2025), ICLR (2025)


template inspired by Andrew Owens