Chen-Hsuan Lin

The first name is Chen-Hsuan (neither just Chen nor Hsuan).
Hsuan is pronounced like "shoo-en" with a quick transition.
I am a research scientist at NVIDIA Research, working on computer vision, computer graphics, and artificial intelligence. I received my Ph.D. in Robotics from Carnegie Mellon University, where I was advised by Simon Lucey and supported by the NVIDIA Graduate Fellowship. During my Ph.D. years, I have also spent internships at Facebook AI Research and Adobe Research. Before that, I received my M.S. in Robotics from CMU and B.S. in Electrical Engineering from National Taiwan University.
I am interested in solving 3D reconstruction and view synthesis problems using neural rendering and self-supervised learning techniques. My research goal is to empower AI systems with dense 3D perception and imagination abilities by learning from visual data in the wild, in order to advance towards the next level of visual and 3D spatial artificial intelligence.
Email:  chenhsuanl (at) nvidia (dot) com

Updates

10/2021
The oral presentation of our ICCV 2021 paper BARF is now online here!
10/2021
I gave a talk (in person!) at the MIT vision & graphics seminar (recording here).
08/2021
The code of our ICCV 2021 paper BARF is now released, check it out here!
08/2021
I have joined NVIDIA Research as a research scientist!
07/2021
I have one paper accepted to ICCV 2021 as an oral presentation!
06/2021
I have defended my Ph.D. thesis! The thesis and dissertation talk are now online.
04/2021
We released our latest work on training NeRF from unknown camera poses!
10/2020
The code and short talk of our NeurIPS 2020 paper are online! Check out the project page.
older updates... (show)

Research

BARF: Bundle-Adjusting Neural Radiance Fields

Chen-Hsuan Lin, Wei-Chiu Ma, Antonio Torralba, and Simon Lucey
IEEE International Conference on Computer Vision (ICCV), 2021 (oral presentation)
paper arXiv project page presentation code BibTex (show)
@inproceedings{lin2021barf,
  title={BARF: Bundle-Adjusting Neural Radiance Fields},
  author={Lin, Chen-Hsuan and Ma, Wei-Chiu and Torralba, Antonio and Lucey, Simon},
  booktitle={IEEE International Conference on Computer Vision ({ICCV})},
  year={2021}
}
Neural Radiance Fields can be trained from unknown camera poses! Inspired by classical image alignment, we show that coarse-to-fine optimization is simple yet effective for joint registration and reconstruction on 3D scene representations.

SDF-SRN: Learning Signed Distance 3D Object Reconstruction from Static Images

Chen-Hsuan Lin, Chaoyang Wang, and Simon Lucey
Advances in Neural Information Processing Systems (NeurIPS), 2020
paper arXiv project page code BibTex (show)
@inproceedings{lin2020sdfsrn,
  title={SDF-SRN: Learning Signed Distance 3D Object Reconstruction from Static Images},
  author={Lin, Chen-Hsuan and Wang, Chaoyang and Lucey, Simon},
  booktitle={Advances in Neural Information Processing Systems ({NeurIPS})},
  year={2020}
}
Implicit 3D shape reconstruction can be trained from static image collections without multi-view supervision! We establish the geometric connection of 2D silhouettes to 3D SDF shapes for scalable single-view training on real-world image data.

Deep NRSfM++: Towards Unsupervised 2D-3D Lifting in the Wild

Chaoyang Wang, Chen-Hsuan Lin, and Simon Lucey
IEEE International Conference on 3D Vision (3DV), 2020 (oral presentation)
paper (arXiv) BibTex (show)
@inproceedings{wang2020deep,
  title={Deep NRSfM++: Towards Unsupervised 2D-3D Lifting in the Wild},
  author={Wang, Chaoyang and Lin, Chen-Hsuan and Lucey, Simon},
  booktitle={IEEE International Conference on 3D Vision ({3DV})},
  year={2020}
}
A self-supervised framework for 3D structure and pose recovery from 2D landmarks, closely related to hierarchical block-sparse coding in non-rigid structure from motion. Our method can handle perspective camera models and missing data.

Photometric Mesh Optimization for Video-Aligned 3D Object Reconstruction

Chen-Hsuan Lin, Oliver Wang, Bryan C. Russell, Eli Shechtman, Vladimir G. Kim, Matthew Fisher, and Simon Lucey
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019
paper arXiv project page code BibTex (show)
@inproceedings{lin2019photometric,
  title={Photometric Mesh Optimization for Video-Aligned 3D Object Reconstruction},
  author={Lin, Chen-Hsuan and Wang, Oliver and Russell, Bryan C and Shechtman, Eli and Kim, Vladimir G and Fisher, Matthew and Lucey, Simon},
  booktitle={IEEE Conference on Computer Vision and Pattern Recognition ({CVPR})},
  year={2019}
}
3D mesh reconstruction from RGB videos using photometric optimization with learned shape priors. This allows 3D object meshes to deform in a learned shape space while being pixel-aligned against RGB videos without depth or silhouettes.

ST-GAN: Spatial Transformer Generative Adversarial Networks for Image Compositing

Chen-Hsuan Lin, Ersin Yumer, Oliver Wang, Eli Shechtman, and Simon Lucey
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018
paper arXiv project page code BibTex (show)
@inproceedings{lin2018stgan,
  title={ST-GAN: Spatial Transformer Generative Adversarial Networks for Image Compositing},
  author={Lin, Chen-Hsuan and Yumer, Ersin and Wang, Oliver and Shechtman, Eli and Lucey, Simon},
  booktitle={IEEE Conference on Computer Vision and Pattern Recognition ({CVPR})},
  year={2018}
}
GANs can learn to correct the geometry of objects and create realistic image composites! Our method discovers plausible geometric configurations of objects driven solely by appearance realism, where ground-truth supervision are unavailable.

Deep-LK for Efficient Adaptive Object Tracking

Chaoyang Wang, Hamed Kiani Galoogahi, Chen-Hsuan Lin, and Simon Lucey
IEEE International Conference on Robotics and Automation (ICRA), 2018
paper arXiv BibTex (show)
@inproceedings{wang2018deeplk,
  title={Deep-LK for Efficient Adaptive Object Tracking},
  author={Wang, Chaoyang and Galoogahi, Hamed Kiani and Lin, Chen-Hsuan and Lucey, Simon},
  booktitle={IEEE International Conference on Robotics and Automation ({ICRA})},
  year={2018}
}
We can use Siamese neural networks to learn general object tracking by optimizing a registration-based objective function. The learned feature representations adapts to the regression parameters online with respect to the tracked templates.

Learning Efficient Point Cloud Generation for Dense 3D Object Reconstruction

Chen-Hsuan Lin, Chen Kong, and Simon Lucey
AAAI Conference on Artificial Intelligence (AAAI), 2018 (oral presentation)
paper arXiv project page code BibTex (show)
@inproceedings{lin2018learning,
  title={Learning Efficient Point Cloud Generation for Dense 3D Object Reconstruction},
  author={Lin, Chen-Hsuan and Kong, Chen and Lucey, Simon},
  booktitle={AAAI Conference on Artificial Intelligence ({AAAI})},
  year={2018}
}
We design a novel differentiable point cloud renderer to approximate the rasterization of dense 3D point clouds generated by a 2D convolutional neural network, so that the generated point clouds can be supervised from training depth images.

Object-Centric Photometric Bundle Adjustment with Deep Shape Prior

Rui Zhu, Chaoyang Wang, Chen-Hsuan Lin, Ziyan Wang, and Simon Lucey
IEEE Winter Conference on Applications of Computer Vision (WACV), 2018
paper arXiv extension paper BibTex (show)
@inproceedings{zhu2017object,
  title={Object-Centric Photometric Bundle Adjustment with Deep Shape Prior},
  author={Zhu, Rui and Wang, Chaoyang and Lin, Chen-Hsuan and Wang, Ziyan and Lucey, Simon},
  booktitle={IEEE Winter Conference on Applications of Computer Vision ({WACV})},
  year={2018}
}
3D shape prediction networks can be utilized as a strong semantic prior for object-centric photometric bundle adjustment. We use pretrained 3D point cloud generators to align shapes to videos within an optimization-based inference framework.

Inverse Compositional Spatial Transformer Networks

Chen-Hsuan Lin and Simon Lucey
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017 (oral presentation)
paper arXiv project page presentation code BibTex (show)
@inproceedings{lin2017inverse,
  title={Inverse Compositional Spatial Transformer Networks},
  author={Lin, Chen-Hsuan and Lucey, Simon},
  booktitle={IEEE Conference on Computer Vision and Pattern Recognition ({CVPR})},
  year={2017}
}
A redesign of Spatial Transformer Networks inspired by the Lucas-Kanade algorithm. With the same network architecture, our method learns recurrent spatial transformations to resolve geometric redundancies for efficient visual recognition.

Using Locally Corresponding CAD Models for Dense 3D Reconstructions from a Single Image

Chen Kong, Chen-Hsuan Lin, and Simon Lucey
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017
paper BibTex (show)
@inproceedings{kong2017using,
  title={Using Locally Corresponding CAD Models for Dense 3D Reconstructions from a Single Image},
  author={Kong, Chen and Lin, Chen-Hsuan and Lucey, Simon},
  booktitle={IEEE Conference on Computer Vision and Pattern Recognition ({CVPR})},
  year={2017}
}
A 3D shape reconstruction method based on CAD model retrieval. By matching keypoint projections from the input image, we can use local landmark correspondences to solve for sparse linear combinations of a prebuilt CAD model dictionary.

The Conditional Lucas & Kanade Algorithm

Chen-Hsuan Lin, Rui Zhu, and Simon Lucey
European Conference on Computer Vision (ECCV), 2016
paper arXiv project page code BibTex (show)
@inproceedings{lin2016conditional,
  title={The Conditional Lucas \& Kanade Algorithm},
  author={Lin, Chen-Hsuan and Zhu, Rui and Lucey, Simon},
  booktitle={European Conference on Computer Vision (ECCV)},
  pages={793--808},
  year={2016},
  organization={Springer International Publishing}
}
A learning-based image registration method inspired by the seminal Lucas-Kanade algorithm. With structured optimization and a conditional loss, our method significantly improves over classical synthesis-based optimization objective functions.

Ph.D. Dissertation

Learning 3D Registration and Reconstruction from the Visual World

Chen-Hsuan Lin
Carnegie Mellon University, 2021
thesis dissertation talk (hide) BibTex (show)
@phdthesis{lin2021learning,
  title={Learning 3D Registration and Reconstruction from the Visual World},
  author={Lin, Chen-Hsuan},
  year={2021},
  month={June},
  school={The Robotics Institute, Carnegie Mellon University},
  address={Pittsburgh, PA},
  number={CMU-RI-TR-21-13},
}

Experiences

NVIDIA Research, 2021 – present
Research Scientist
Research in dense 3D reconstruction, self-supervised learning, and neural rendering.
Carnegie Mellon University, 2014 – 2021
Graduate Research Assistant (with Simon Lucey)
Research in geometric image registration, dense 3D reconstruction, and self-supervised learning.
Facebook AI Research, 2019
Research Intern (with Kaiming He, Georgia Gkioxari, and Justin Johnson)
Learning 3D-aware feature representations for improving standard 2D object detection systems.
Adobe Research, 2018
Research Intern (with Oliver Wang, Bryan Russell, Eli Shechtman, Vladimir Kim, and Matthew Fisher)
Photometric optimization of 3D object meshes for shape reconstruction aligned to RGB videos.
Adobe Research, 2017
Research Intern (with Eli Shechtman, Oliver Wang, and Ersin Yumer)
Learning geometric corrections of composited objects in images driven by appearance realism.
National Taiwan University, 2011 – 2013
Undergraduate Research Assistant (with Homer H. Chen)
Designing rate-distortion optimization for video compression based on perceptual quality metrics.

Teaching

Visual Learning and Recognition (CMU 16-824), Spring 2019
Teaching Assistant / Graduate Student Instructor (with Abhinav Gupta)
(Lectures: 3D Vision & 3D Reasoning, Semantic Segmentation & Pixel Labeling)
Computer Vision (CMU 16-720 A/B), Fall 2017
Head Teaching Assistant (with Srinivasa Narasimhan, Simon Lucey, and Yaser Sheikh)
Designing Computer Vision Apps (CMU 16-423), Fall 2015
Teaching Assistant (with Simon Lucey)

Academic Projects

Towards a More Curious Agent

CMU 10-703 Deep Reinforcement Learning & Control
paper
We model intrinsic rewards of agents with the causal distribution of visual observations for policy networks, solving navigation problems with very sparse rewards.

Disentangler Networks with Absolute and Relative Attributes

CMU 16-824 Visual Learning & Recognition
paper
A neural network that disentangles image embeddings into controllable attributes for image manipulation that can be learned from relative ranking supervision.

3D Facial Model Fitting from 2D Videos

CMU CI2CV Computer Vision Lab
video (show)
A 3D reconstruction system of metric-scale faces from self-captured 2D videos by solving for sparse 3D facial landmarks followed by dense 3D mesh fitting.

Video Summarization via Convolutional Neural Networks

CMU 10-701 Machine Learning
report
We design a new objective for end-to-end learning of video summarization, which allows K-means clustering of input video frames in the latent space at test time.

Perceptual Rate-Distortion Optimization of Motion Estimation

NTU Multimedia Processing & Communications Lab
paper
An optimization framework for video coding to find the optimal tradeoff between the encoding bitrate and the decoding distortion using perceptual metrics.

Virtual Piano Keyboard System

NTU Digital Circuit Design Lab
presentation demo
A sensor-based virtual instrumental system with only a paper keyboard using real-time fingertip and keyboard pattern recognition on raw CCD sensory input data.


© designed by Chen-Hsuan Lin.