Chen-Hsuan Lin

Chen-Hsuan is the first name (neither just Chen nor Hsuan).
Hsuan is pronounced like "shoo-en" with a quick transition.
Staff Research Scientist & Research Manager @ NVIDIA
I am a staff research scientist and research manager in NVIDIA Cosmos Lab, building NVIDIA Cosmos world models for physical AI. I am leading a team developing interactive world models — AI systems that enable robots to perceive, simulate, and interact with the physical world in real time. My research has focused on recovering and synthesizing 3D structures and dynamics of the visual world, recognized as one of TIME Magazine's Best Inventions of 2023.
I received my Ph.D. in Robotics from Carnegie Mellon University, where I was supported by the NVIDIA Graduate Fellowship. Before that, I received my B.S. from National Taiwan University.
Chen-Hsuan Lin

Updates

Highlights

Research

Plenoptic Video Generation teaser

Plenoptic Video Generation

Xiao Fu, Shitao Tang, Min Shi, Xian Liu, Jinwei Gu, Ming-Yu Liu, Dahua Lin, Chen-Hsuan Lin CVPR 2026
We tackle multi-view coherence in video re-rendering via autoregressive generation with camera-guided retrieval and self-conditioning. Retrieved observations synchronize appearance across viewpoints while ensuring temporal consistency. Our approach achieves state-of-the-art results on various benchmarks.
DreamDojo teaser

DreamDojo: A Generalist Robot World Model from Large-Scale Human Videos

Shenyuan Gao*, William Liang*, Kaiyuan Zheng, Ayaan Malik, Seonghyeon Ye, Sihyun Yu, Wei-Cheng Tseng, Yuzhu Dong, Kaichun Mo, Chen-Hsuan Lin, Qianli Ma, Seungjun Nah, Loic Magne, Jiannan Xiang, Yuqi Xie, Ruijie Zheng, Dantong Niu, You Liang Tan, K.R. Zentner, George Kurian, Suneel Indupuru, Pooya Jannaty, Jinwei Gu, Jun Zhang, Jitendra Malik, Pieter Abbeel, Ming-Yu Liu, Yuke Zhu, Joel Jang, Linxi “Jim” Fan   (*: equal contributions) Technical report 2026
DreamDojo trains a generalist robot world model on 44,000+ hours of egocentric human video. Continuous latent actions bridge scarce robot action labels via knowledge transfer from unlabeled footage, enabling the model to support teleoperation, policy evaluation, and model-based planning.
Cosmos 2.5 teaser

World Simulation with Video Foundation Models for Physical AI

NVIDIA Cosmos team (Chen-Hsuan Lin: core contributor) Technical report 2025
Cosmos-Predict2.5 is a flow-based world foundation model unifying text-, image-, and video-conditioned generation at 2B/14B scales with RL refinement. Cosmos-Transfer2.5 converts structured inputs — segmentation, depth, edge maps — into high-fidelity video for simulation and data generation.
ViPE teaser

ViPE: Video Pose Engine for 3D Geometric Perception

Jiahui Huang, Qunjie Zhou, Hesam Rabeti, Aleksandr Korovko, Huan Ling, Xuanchi Ren, Tianchang Shen, Jun Gao, Dmitry Slepichev, Chen-Hsuan Lin, Jiawei Ren, Kevin Xie, Joydeep Biswas, Laura Leal-Taixe, Sanja Fidler Technical report 2025
ViPE recovers camera intrinsics, per-frame motion, and dense depth from unconstrained in-the-wild videos without any known camera parameters. It handles diverse footage from selfies to dashcam recordings and scales to auto-annotate large collections. We also release a dataset of ~96M annotated frames.
Scenethesis teaser

Scenethesis: A Language and Vision Agentic Framework for 3D Scene Generation

Lu Ling, Chen-Hsuan Lin, Tsung-Yi Lin, Yifan Ding, Yu Zeng, Yichen Sheng, Yunhao Ge, Ming-Yu Liu, Aniket Bera, Zhaoshuo Li ICLR 2026
Scenethesis generates realistic 3D scenes from text without any task-specific training. An LLM drafts a coarse layout, vision modules generate image guidance and extract inter-object relations, and an optimization step enforces physical plausibility. The result is diverse, fully editable 3D scene arrangements.
DynPose-100K teaser

Dynamic Camera Poses and Where to Find Them

Chris Rockwell, Joseph Tung, Tsung-Yi Lin, Ming-Yu Liu, David F. Fouhey, Chen-Hsuan Lin CVPR 2025
DynPose-100K is a large-scale dataset of dynamic internet videos annotated with camera poses. The pipeline uses task-specific and generalist models for filtering, then point tracking, dynamic masking, and structure-from-motion for accurate pose estimation across diverse real-world scenes.
Cosmos teaser

Cosmos World Foundation Model Platform for Physical AI

NVIDIA Cosmos team (Chen-Hsuan Lin: core contributor) Best AI + Best overall of CES 2025 Technical report 2025
Cosmos is an open-source world foundation model platform for physical AI. It provides pre-trained world models, video tokenizers, post-training recipes, and a video curation pipeline — a comprehensive toolkit for the robotics and autonomous vehicle communities to build specialized world models.
Edify 3D teaser

Edify 3D: Scalable High-Quality 3D Asset Generation

NVIDIA Edify team (Chen-Hsuan Lin: core contributor) Technical report 2024
Edify 3D enables scalable high-quality 3D asset generation from text or image inputs. It synthesizes consistent multi-view RGB and surface normals via a diffusion model, then lifts them to 3D shape, high-resolution textures, and PBR materials — delivering a production-ready asset in under 2 minutes.
Coverage: NVIDIA (blog and blog) | Forbes | VentureBeat (news and news) | Two Minute Papers | fxguide | Animation World Network
GenUSD teaser

GenUSD: 3D Scene Generation Made Easy

NVIDIA Edify team (Chen-Hsuan Lin: core contributor) SIGGRAPH 2024 Real-Time Live!
GenUSD transforms natural language prompts into realistic, fully editable 3D scenes in USD (Universal Scene Description) format. It combines an LLM for high-level scene layout planning with Edify 3D for high-quality asset generation, as demonstrated live at SIGGRAPH 2024 Real-Time Live!
Coverage: NVIDIA
ATT3D teaser

ATT3D: Amortized Text-to-3D Object Synthesis

Jonathan Lorraine, Kevin Xie, Xiaohui Zeng, Chen-Hsuan Lin, Towaki Takikawa, Nicholas Sharp, Tsung-Yi Lin, Ming-Yu Liu, Sanja Fidler, James Lucas ICCV 2023
Generating high-quality 3D assets from text typically requires lengthy per-prompt optimization. We instead train a unified model that amortizes this across many prompts, sharing computation and enabling knowledge transfer — unlocking smooth interpolation between text-described 3D shapes.
Neuralangelo teaser

Neuralangelo: High-Fidelity Neural Surface Reconstruction

Zhaoshuo Li, Thomas Müller, Alex Evans, Russell H. Taylor, Mathias Unberath, Ming-Yu Liu, Chen-Hsuan Lin TIME's Best Inventions of 2023 CVPR 2023
We achieve high-fidelity 3D surface reconstruction from RGB video — named a TIME Best Invention of 2023. Multi-resolution hash grids with numerical gradients and a coarse-to-fine optimization strategy recover fine-grained geometric details of large-scale scenes at unprecedented fidelity.
Coverage: NVIDIA | TIME (Best Inventions) | VentureBeat | The Verge | Engadget | WIRED | BBC Science Focus | Yahoo! News | CG Channel | Fast Company | Two Minute Papers (video and video) | fxguide | Computerworld | PetaPixel | MarkTechPost (blog and blog) | Creative Bloq
Magic3D teaser

Magic3D: High-Resolution Text-to-3D Content Creation

Chen-Hsuan Lin*, Jun Gao*, Luming Tang*, Towaki Takikawa*, Xiaohui Zeng*, Xun Huang, Karsten Kreis, Sanja Fidler, Ming-Yu Liu, Tsung-Yi Lin   (*: equal contributions) CVPR 2023 (highlight)
We generate high-quality 3D textured meshes from text, with support for editing and image-conditioned control. Our two-stage pipeline builds a coarse NeRF via sparse 3D hash structures, then refines to a high-resolution mesh via latent diffusion — achieving 2x faster generation than prior work.
BARF teaser

BARF: Bundle-Adjusting Neural Radiance Fields

Chen-Hsuan Lin, Wei-Chiu Ma, Antonio Torralba, Simon Lucey ICCV 2021 (oral presentation)
BARF jointly trains a NeRF and registers camera poses from imperfect or unknown inputs. Inspired by classical image alignment, coarse-to-fine positional encoding progressively resolves large misalignments, enabling view synthesis and localization from completely unposed videos.
Coverage: DeepLearning.AI
Earlier Works

SDF-SRN: Learning Signed Distance 3D Object Reconstruction from Static Images

Chen-Hsuan Lin, Chaoyang Wang, Simon Lucey

Deep NRSfM++: Towards Unsupervised 2D-3D Lifting in the Wild

Chaoyang Wang, Chen-Hsuan Lin, Simon Lucey

Photometric Mesh Optimization for Video-Aligned 3D Object Reconstruction

Chen-Hsuan Lin, Oliver Wang, Bryan C. Russell, Eli Shechtman, Vladimir G. Kim, Matthew Fisher, Simon Lucey

ST-GAN: Spatial Transformer Generative Adversarial Networks for Image Compositing

Chen-Hsuan Lin, Ersin Yumer, Oliver Wang, Eli Shechtman, Simon Lucey

Deep-LK for Efficient Adaptive Object Tracking

Chaoyang Wang, Hamed Kiani Galoogahi, Chen-Hsuan Lin, Simon Lucey

Learning Efficient Point Cloud Generation for Dense 3D Object Reconstruction

Chen-Hsuan Lin, Chen Kong, Simon Lucey

Object-Centric Photometric Bundle Adjustment with Deep Shape Prior

Rui Zhu, Chaoyang Wang, Chen-Hsuan Lin, Ziyan Wang, Simon Lucey

Inverse Compositional Spatial Transformer Networks

Chen-Hsuan Lin, Simon Lucey

Using Locally Corresponding CAD Models for Dense 3D Reconstructions from a Single Image

Chen Kong, Chen-Hsuan Lin, Simon Lucey

The Conditional Lucas & Kanade Algorithm

Chen-Hsuan Lin, Rui Zhu, Simon Lucey

Ph.D. Dissertation

Ph.D. thesis teaser

Learning 3D Registration and Reconstruction from the Visual World

Chen-Hsuan Lin Carnegie Mellon University, 2021

Experience

NVIDIA

(2021 – present) Staff Research Scientist & Research Manager

Carnegie Mellon University

(2014 – 2021) Graduate Research Assistant

Facebook AI Research

(Meta AI, 2019) Research Intern

Adobe Research

(2017, 2018) Research Intern

National Taiwan University

(2011 – 2013) Undergraduate Research Assistant

Mentorship