Chen-Hsuan Lin

Chen-Hsuan is the first name (neither just Chen nor Hsuan).
Hsuan is pronounced like "shoo-en" with a quick transition.

I am a senior research scientist at NVIDIA Research, working on computer vision, computer graphics, and generative AI applications. I am interested in world modeling and 3D content creation, involving technologies using 3D reconstruction, neural rendering, generative models, and beyond. My research aims to empower AI systems with 3D visual intelligence: human-level 3D perception and imagination abilities. My research has been recognized with a Best Inventions of 2023 by TIME Magazine.

I received my Ph.D. in Robotics from Carnegie Mellon University, where I was advised by Simon Lucey and supported by the NVIDIA Graduate Fellowship. I also spent internships at Facebook AI Research and Adobe Research. I received my B.S. in Electrical Engineering from National Taiwan University.

Email: chenhsuanl (at) nvidia (dot) com

CV (updated: 12/2024) Google Scholar GitHub X (Twitter) Linkedin Ph.D. thesis NVIDIA profile

Updates

older updates... (show)

Highlights

Cosmos

World models for physical AI

Edify 3D

High-quality 3D asset generation

Neuralangelo

Neural surface reconstruction

Research
(show publications: primary / full list)

Dynamic Camera Poses and Where to Find Them

Chris Rockwell, Joseph Tung, Tsung-Yi Lin, Ming-Yu Liu, David F. Fouhey, Chen-Hsuan Lin
CVPR 2025

paper • project page • dataset • BibTex (show)

@inproceedings{rockwell2025dynamic,
  title={Dynamic Camera Poses and Where to Find Them},
  author={Rockwell, Chris and Tung, Joseph and Lin, Tsung-Yi and Liu, Ming-Yu and Fouhey, David F. and Lin, Chen-Hsuan},
  booktitle={IEEE Conference on Computer Vision and Pattern Recognition ({CVPR})},
  year={2025}
}

DynPose-100K is a large-scale dynamic Internet video dataset annotated with camera information. It contains diverse and dynamic video content annotated with a state-of-the-art camera pose estimation pipeline.

Cosmos World Foundation Model Platform for Physical AI

NVIDIA (Chen-Hsuan Lin: core contributor)
Best AI + Best overall of CES 2025

paper • website • video • project page • demo API • code • BibTex (show)

Coverage: NVIDIA (news and blog), TechCrunch, VentureBeat, WIRED, Fortune, Reuters, Business Insider, BBC, Forbes (news, news and news), New York Times, Wall Street Journal, MIT Technology Review, Newsweek, Bloomberg, Financial Times, Two Minute Papers

@article{nvidia2025cosmos,
  title={Cosmos World Foundation Model Platform for Physical AI},
  author={Agarwal, Niket and Ali, Arslan and Bala, Maciej and Balaji, Yogesh and Barker, Erik and Cai, Tiffany and Chattopadhyay, Prithvijit and Chen, Yongxin and Cui, Yin and Ding, Yifan and Dworakowski, Daniel and Fan, Jiaojiao and Fenzi, Michele and Ferroni, Francesco and Fidler, Sanja and Fox, Dieter and Ge, Songwei and Ge, Yunhao and Gu, Jinwei and Gururani, Siddharth and He, Ethan and Huang, Jiahui and Huffman, Jacob and Jannaty, Pooya and Jin, Jingyi and Kim, Seung Wook and Klár, Gergely and Lam, Grace and Lan, Shiyi and Leal-Taixe, Laura and Li, Anqi and Li, Zhaoshuo and Lin, Chen-Hsuan and Lin, Tsung-Yi and Ling, Huan and Liu, Ming-Yu and Liu, Xian and Luo, Alice and Ma, Qianli and Mao, Hanzi and Mo, Kaichun and Mousavian, Arsalan and Nah, Seungjun and Niverty, Sriharsha and Page, David and Paschalidou, Despoina and Patel, Zeeshan and Pavao, Lindsey and Ramezanali, Morteza and Reda, Fitsum and Ren, Xiaowei and Sabavat, Vasanth Rao Naik and Schmerling, Ed and Shi, Stella and Stefaniak, Bartosz and Tang, Shitao and Tchapmi, Lyne and Tredak, Przemek and Tseng, Wei-Cheng and Varghese, Jibin and Wang, Hao and Wang, Haoxiang and Wang, Heng and Wang, Ting-Chun and Wei, Fangyin and Wei, Xinyue and Wu, Jay Zhangjie and Xu, Jiashu and Yang, Wei and Yen-Chen, Lin and Zeng, Xiaohui and Zeng, Yu and Zhang, Jing and Zhang, Qinsheng and Zhang, Yuxuan and Zhao, Qingqing and Zolkowski, Artur},
  journal={arXiv preprint arXiv:2501.03575},
  year={2025}
}

Cosmos is a world foundation model platform developing of customized world models for physical AI systems. Cosmos includes pre-trained and post-trained world models, video tokenizers, and data curation pipelines.

Edify 3D: Scalable High-Quality 3D Asset Generation

NVIDIA (Chen-Hsuan Lin: core contributor)

paper • project page • product (Shutterstock) • BibTex (show)

Coverage: NVIDIA (blog and blog), Forbes, VentureBeat (news and news), Two Minute Papers, fxguide, Animation World Network

@article{nvidia2024edify3d,
  title={Edify 3D: Scalable High-Quality 3D Asset Generation},
  author={NVIDIA and Bala, Maciej and Cui, Yin and Ding, Yifan and Ge, Yunhao and Hao, Zekun and Hasselgren, Jon and Huffman, Jacob and Jin, Jingyi and Lewis, J.P. and Li, Zhaoshuo and Lin, Chen-Hsuan and Lin, Yen-Chen and Lin, Tsung-Yi and Liu, Ming-Yu and Luo, Alice and Ma, Qianli and Munkberg, Jacob and Shi, Stella and Wei, Fangyin and Xiang, Donglai and Xu, Jiashu and Zeng, Xiaohui and Zhang, Qinsheng},
  journal={arXiv preprint arXiv:2411.07135},
  year={2024}
}

Edify 3D is a model for high-quality 3D asset generation with detailed geometry, quad shape topologies, and PBR materials. Edify 3D is a production-ready technology and powers the generative 3D API from Shutterstock.

GenUSD: 3D Scene Generation Made Easy

NVIDIA (Chen-Hsuan Lin: core contributor)
SIGGRAPH 2024 Real-Time Live!

paper • presentation • BibTex (show)

@inproceedings{lin2024genusd,
  title={GenUSD: 3D Scene Generation Made Easy},
  author={Lin, Tsung-Yi and Lin, Chen-Hsuan and Cui, Yin and Ge, Yunhao and Nah, Seungjun and Mallya, Arun and Hao, Zekun and Ding, Yifan and Mao, Hanzi and Li, Zhaoshuo and Lin, Yen-Chen and Zeng, Xiaohui and Zhang, Qinsheng and Xiang, Donglai and Ma, Qianli and Lewis, J.P. and Jin, Jingyi and Jannaty, Pooya and Liu, Ming-Yu},
  booktitle={ACM SIGGRAPH 2024 Real-Time Live!},
  year={2024}
}

We introduce a system that transforms natural language queries into realistic 3D scenes. It combines an LLM to generate scene layouts and Edify 3D for high-quality 3D assets while preserving editability on the scene.

ATT3D: Amortized Text-to-3D Object Synthesis

Jonathan Lorraine, Kevin Xie, Xiaohui Zeng, Chen-Hsuan Lin, Towaki Takikawa, Nicholas Sharp, Tsung-Yi Lin, Ming-Yu Liu, Sanja Fidler, James Lucas
ICCV 2023

paper • project page • BibTex (show)

@inproceedings{lorraine2023att3d,
  title={ATT3D: Amortized Text-to-3D Object Synthesis},
  author={Lorraine, Jonathan and Xie, Kevin and Zeng, Xiaohui and Lin, Chen-Hsuan and Takikawa, Towaki and Sharp, Nicholas and Lin, Tsung-Yi and Liu, Ming-Yu and Fidler, Sanja and Lucas, James},
  booktitle={IEEE International Conference on Computer Vision ({ICCV})},
  year={2023}
}

Generating high-quality 3D assets from input text typically requires lengthy per-prompt optimization. Instead, we can train a generalizable model to amortize the optimization process for fast text-to-3D generation.

Neuralangelo: High-Fidelity Neural Surface Reconstruction

Zhaoshuo Li, Thomas Müller, Alex Evans, Russell H. Taylor, Mathias Unberath, Ming-Yu Liu, Chen-Hsuan Lin
CVPR 2023
TIME's Best Inventions of 2023

paper • project page • code • BibTex (show)

Coverage: NVIDIA, TIME (Best Inventions), VentureBeat, The Verge, Engadget, WIRED, BBC Science Focus, Yahoo! News, CG Channel, Fast Company, Two Minute Papers (video and video), fxguide, Computerworld, PetaPixel, MarkTechPost (blog and blog), Creative Bloq

@inproceedings{li2023neuralangelo,
  title={Neuralangelo: High-Fidelity Neural Surface Reconstruction},
  author={Li, Zhaoshuo and M\"uller, Thomas and Evans, Alex and Taylor, Russell H and Unberath, Mathias and Liu, Ming-Yu and Lin, Chen-Hsuan},
  booktitle={IEEE Conference on Computer Vision and Pattern Recognition ({CVPR})},
  year={2023}
}

We create 3D surface reconstruction with extremely high fidelity from RGB video captures! Numerical gradients with coarse-to-fine optimization are the keys to unlock the full potential of multi-resolution hash encoding.

Magic3D: High-Resolution Text-to-3D Content Creation

Chen-Hsuan Lin*, Jun Gao*, Luming Tang*, Towaki Takikawa*, Xiaohui Zeng*, Xun Huang, Karsten Kreis, Sanja Fidler^†, Ming-Yu Liu^†, Tsung-Yi Lin (*^†: equal contributions)
CVPR 2023 (highlight)

paper • project page • BibTex (show)

Coverage: Ars Technica, Forbes, MarkTechPost, Gigazine, 3D Printing Industry

@inproceedings{lin2023magic3d,
  title={Magic3D: High-Resolution Text-to-3D Content Creation},
  author={Lin, Chen-Hsuan and Gao, Jun and Tang, Luming and Takikawa, Towaki and Zeng, Xiaohui and Huang, Xun and Kreis, Karsten and Fidler, Sanja and Liu, Ming-Yu and Lin, Tsung-Yi},
  booktitle={IEEE Conference on Computer Vision and Pattern Recognition ({CVPR})},
  year={2023}
}

We create high-quality 3D textured mesh models from text prompts with editing capabilities! We utilize a two-stage optimization pipeline with different diffusion models for fast and high-resolution text-to-3D generation.

BARF: Bundle-Adjusting Neural Radiance Fields

Chen-Hsuan Lin, Wei-Chiu Ma, Antonio Torralba, Simon Lucey
ICCV 2021 (oral presentation)

paper • project page • presentation • code • BibTex (show)

Coverage: DeepLearning.AI

@inproceedings{lin2021barf,
  title={BARF: Bundle-Adjusting Neural Radiance Fields},
  author={Lin, Chen-Hsuan and Ma, Wei-Chiu and Torralba, Antonio and Lucey, Simon},
  booktitle={IEEE International Conference on Computer Vision ({ICCV})},
  year={2021}
}

We can optimize a NeRF from a video sequence with unknown camera poses! Coarse-to-fine optimization is a simple yet effective strategy to jointly solve for registration and reconstruction on neural scene representations.

SDF-SRN: Learning Signed Distance 3D Object Reconstruction from Static Images

Chen-Hsuan Lin, Chaoyang Wang, Simon Lucey
NeurIPS 2020

paper • project page • code • BibTex (show)

@inproceedings{lin2020sdfsrn,
  title={SDF-SRN: Learning Signed Distance 3D Object Reconstruction from Static Images},
  author={Lin, Chen-Hsuan and Wang, Chaoyang and Lucey, Simon},
  booktitle={Advances in Neural Information Processing Systems ({NeurIPS})},
  year={2020}
}

We design a geometric loss to supervise neural SDFs with 2D object masks. This allows scalable single-view training of neural 3D shape reconstruction from real-world images, without relying on multi-view supervision.

Deep NRSfM++: Towards Unsupervised 2D-3D Lifting in the Wild

Chaoyang Wang, Chen-Hsuan Lin, Simon Lucey
3DV 2020 (oral presentation)

paper • BibTex (show)

@inproceedings{wang2020deep,
  title={Deep NRSfM++: Towards Unsupervised 2D-3D Lifting in the Wild},
  author={Wang, Chaoyang and Lin, Chen-Hsuan and Lucey, Simon},
  booktitle={IEEE International Conference on 3D Vision ({3DV})},
  year={2020}
}

We design a self-supervised method for learning to recover 3D structure and poses from 2D keypoints. It uses hierarchical block-sparse coding in NRSfM frameworks and handles perspective cameras and missing data.

Photometric Mesh Optimization for Video-Aligned 3D Object Reconstruction

Chen-Hsuan Lin, Oliver Wang, Bryan C. Russell, Eli Shechtman, Vladimir G. Kim, Matthew Fisher, Simon Lucey
CVPR 2019

paper • project page • code • BibTex (show)

@inproceedings{lin2019photometric,
  title={Photometric Mesh Optimization for Video-Aligned 3D Object Reconstruction},
  author={Lin, Chen-Hsuan and Wang, Oliver and Russell, Bryan C and Shechtman, Eli and Kim, Vladimir G and Fisher, Matthew and Lucey, Simon},
  booktitle={IEEE Conference on Computer Vision and Pattern Recognition ({CVPR})},
  year={2019}
}

Given an RGB video capture, we optimize an initial 3D mesh prediction for photometric consistency to make it pixel-aligned with the video. By using a pretrained shape prior, we can bypass depth and mask constraints.

ST-GAN: Spatial Transformer Generative Adversarial Networks for Image Compositing

Chen-Hsuan Lin, Ersin Yumer, Oliver Wang, Eli Shechtman, Simon Lucey
CVPR 2018

paper • project page • code • BibTex (show)

@inproceedings{lin2018stgan,
  title={ST-GAN: Spatial Transformer Generative Adversarial Networks for Image Compositing},
  author={Lin, Chen-Hsuan and Yumer, Ersin and Wang, Oliver and Shechtman, Eli and Lucey, Simon},
  booktitle={IEEE Conference on Computer Vision and Pattern Recognition ({CVPR})},
  year={2018}
}

We can make GANs learn to correct the perspective geometry of objects and create realistic image composites. This can be trained solely from appearance realism where ground-truth geometry supervision is unavailable.

Deep-LK for Efficient Adaptive Object Tracking

Chaoyang Wang, Hamed Kiani Galoogahi, Chen-Hsuan Lin, Simon Lucey
ICRA 2018

paper • BibTex (show)

@inproceedings{wang2018deeplk,
  title={Deep-LK for Efficient Adaptive Object Tracking},
  author={Wang, Chaoyang and Galoogahi, Hamed Kiani and Lin, Chen-Hsuan and Lucey, Simon},
  booktitle={IEEE International Conference on Robotics and Automation ({ICRA})},
  year={2018}
}

We train Siamese networks for object tracking by unrolling the Lucas-Kanade algorithm as a graph and training parameters end-to-end. The learned feature representation can adapt to the regression parameters online.

Learning Efficient Point Cloud Generation for Dense 3D Object Reconstruction

Chen-Hsuan Lin, Chen Kong, Simon Lucey
AAAI 2018 (oral presentation)

paper • project page • code • BibTex (show)

@inproceedings{lin2018learning,
  title={Learning Efficient Point Cloud Generation for Dense 3D Object Reconstruction},
  author={Lin, Chen-Hsuan and Kong, Chen and Lucey, Simon},
  booktitle={AAAI Conference on Artificial Intelligence ({AAAI})},
  year={2018}
}

We design a differentiable point cloud renderer to approximate the rasterization of 3D point clouds. For single-image 3D shape reconstruction, this can be used to supervise the predicted point clouds with depth images.

Object-Centric Photometric Bundle Adjustment with Deep Shape Prior

Rui Zhu, Chaoyang Wang, Chen-Hsuan Lin, Ziyan Wang, Simon Lucey
WACV 2018

paper • extension paper • BibTex (show)

@inproceedings{zhu2017object,
  title={Object-Centric Photometric Bundle Adjustment with Deep Shape Prior},
  author={Zhu, Rui and Wang, Chaoyang and Lin, Chen-Hsuan and Wang, Ziyan and Lucey, Simon},
  booktitle={IEEE Winter Conference on Applications of Computer Vision ({WACV})},
  year={2018}
}

Given a video capture and an initial 3D point cloud predicted by a neural network, we can use the same neural network as a learned prior to refine the 3D point cloud and camera poses in a joint optimization framework.

Inverse Compositional Spatial Transformer Networks

Chen-Hsuan Lin, Simon Lucey
CVPR 2017 (oral presentation)

paper • project page • presentation • code • BibTex (show)

@inproceedings{lin2017inverse,
  title={Inverse Compositional Spatial Transformer Networks},
  author={Lin, Chen-Hsuan and Lucey, Simon},
  booktitle={IEEE Conference on Computer Vision and Pattern Recognition ({CVPR})},
  year={2017}
}

We redesign Spatial Transformer Networks inspired by the Lucas-Kanade algorithm. It can be iteratively applied as an intermediate network module to predict recurrent spatial transformations for efficient visual recognition.

Using Locally Corresponding CAD Models for Dense 3D Reconstructions from a Single Image

Chen Kong, Chen-Hsuan Lin, Simon Lucey
CVPR 2017

paper • BibTex (show)

@inproceedings{kong2017using,
  title={Using Locally Corresponding CAD Models for Dense 3D Reconstructions from a Single Image},
  author={Kong, Chen and Lin, Chen-Hsuan and Lucey, Simon},
  booktitle={IEEE Conference on Computer Vision and Pattern Recognition ({CVPR})},
  year={2017}
}

Given an image of an object and its partial keypoint annotations, we recover the 3D shape by solving for a sparse linear combination of a prebuilt CAD model dictionary while matching keypoint projections at the same time.

The Conditional Lucas & Kanade Algorithm

Chen-Hsuan Lin, Rui Zhu, Simon Lucey
ECCV 2016

paper • project page • code • BibTex (show)

@inproceedings{lin2016conditional,
  title={The Conditional Lucas \& Kanade Algorithm},
  author={Lin, Chen-Hsuan and Zhu, Rui and Lucey, Simon},
  booktitle={European Conference on Computer Vision (ECCV)},
  pages={793--808},
  year={2016},
  organization={Springer International Publishing}
}

We treat the Lucas-Kanade algorithm as an iterative computation graph, and we optimize the parameters with a conditional loss for registration. This converges much faster than classical synthesis-based optimization.

Ph.D. Dissertation

Learning 3D Registration and Reconstruction from the Visual World

Chen-Hsuan Lin
Carnegie Mellon University, 2021

thesis • dissertation talk (show) • BibTex (show)

@phdthesis{lin2021learning,
  title={Learning 3D Registration and Reconstruction from the Visual World},
  author={Lin, Chen-Hsuan},
  year={2021},
  month={June},
  school={The Robotics Institute, Carnegie Mellon University},
  address={Pittsburgh, PA},
  number={CMU-RI-TR-21-13},
}

Experiences

NVIDIA Research

, 2021 – present
Senior Research Scientist
Research in 3D reconstruction, 3D generation, view synthesis, and neural rendering problems.

Carnegie Mellon University

, 2014 – 2021
Graduate Research Assistant (with Simon Lucey)
Research in geometric image registration, dense 3D reconstruction, and self-supervised learning.

Facebook AI Research

(Meta AI), 2019
Research Intern (with Kaiming He, Georgia Gkioxari, and Justin Johnson)
Learning 3D-aware feature representations for improving standard 2D object detection systems.

Adobe Research

, 2018
Research Intern (with Oliver Wang, Bryan Russell, Eli Shechtman, Vladimir Kim, and Matthew Fisher)
Photometric optimization of 3D object meshes for shape reconstruction aligned to RGB videos.

Adobe Research

, 2017
Research Intern (with Eli Shechtman, Oliver Wang, and Ersin Yumer)
Learning geometric corrections of composited objects in images driven by appearance realism.

National Taiwan University

, 2011 – 2013
Undergraduate Research Assistant (with Homer H. Chen)
Designing rate-distortion optimization for video compression based on perceptual quality metrics.

Teaching

Visual Learning and Recognition

(CMU 16-824), Spring 2019
Teaching Assistant / Graduate Student Instructor
(Lectures: 3D Vision & 3D Reasoning, Semantic Segmentation & Pixel Labeling)

Computer Vision

(CMU 16-720 A/B), Fall 2017
Head Teaching Assistant

Designing Computer Vision Apps

(CMU 16-423), Fall 2015
Teaching Assistant

Chen-Hsuan Lin

Updates

Highlights

Cosmos

Edify 3D

Neuralangelo

Research (show publications: primary / full list)

Dynamic Camera Poses and Where to Find Them

Cosmos World Foundation Model Platform for Physical AI

Edify 3D: Scalable High-Quality 3D Asset Generation

GenUSD: 3D Scene Generation Made Easy

ATT3D: Amortized Text-to-3D Object Synthesis

Neuralangelo: High-Fidelity Neural Surface Reconstruction

Magic3D: High-Resolution Text-to-3D Content Creation

BARF: Bundle-Adjusting Neural Radiance Fields

SDF-SRN: Learning Signed Distance 3D Object Reconstruction from Static Images

Deep NRSfM++: Towards Unsupervised 2D-3D Lifting in the Wild

Photometric Mesh Optimization for Video-Aligned 3D Object Reconstruction

ST-GAN: Spatial Transformer Generative Adversarial Networks for Image Compositing

Deep-LK for Efficient Adaptive Object Tracking

Learning Efficient Point Cloud Generation for Dense 3D Object Reconstruction

Object-Centric Photometric Bundle Adjustment with Deep Shape Prior

Inverse Compositional Spatial Transformer Networks

Using Locally Corresponding CAD Models for Dense 3D Reconstructions from a Single Image

The Conditional Lucas & Kanade Algorithm

Ph.D. Dissertation

Learning 3D Registration and Reconstruction from the Visual World

Experiences

NVIDIA Research

Carnegie Mellon University

Facebook AI Research

Adobe Research

Adobe Research

National Taiwan University

Teaching

Visual Learning and Recognition

Computer Vision

Designing Computer Vision Apps

Research
(show publications: primary / full list)