Chen-Hsuan Lin

I am a 2nd-year Ph.D. student in the Robotics Institute at Carnegie Mellon University, working with Prof. Simon Lucey on computer vision & deep learning research. Before that, I completed my M.S. in Robotics at CMU and B.S. in Electrical Engineering at National Taiwan University.

I am interested in solving 3D vision problems using spatial alignment and generative modeling techniques. My research goal is to enable geometric factorization and reconstruction of the 3D world from visual data, in turn to improve learning efficiency in visual recognition tasks.

I am a recipient of the NVIDIA Graduate Fellowship and the Amazon Go Graduate Fellowship. During internships, I have also closely collaborated with Eli Shechtman, Oliver Wang, Ersin Yumer, Bryan Russell, Vova Kim, and Matthew Fisher at Adobe Research.

CV (last updated: 04/2019) GitHub

Contact email: chlin (at) cmu (dot) edu

Updates

04/2019 I will be joining Facebook AI Research for an internship this summer.
03/2019 I received the inaugural Amazon Go Graduate Fellowship!
02/2019 I have one paper accepted to CVPR 2019!
12/2018 I received the NVIDIA Graduate Fellowship (10 selected worldwide). Thanks NVIDIA!
04/2018 I will be joining Adobe Research for a second internship this summer.
02/2018 I have one paper accepted to CVPR 2018!
older updates... (show)

Research

Photometric Mesh Optimization for Video-Aligned 3D Object Reconstruction

Chen-Hsuan Lin, Oliver Wang, Bryan C. Russell, Eli Shechtman, Vladimir G. Kim, Matthew Fisher, and Simon Lucey
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019
paper arXiv preprint project page code BibTex (show)
@inproceedings{lin2019photometric,
  title={Photometric Mesh Optimization for Video-Aligned 3D Object Reconstruction},
  author={Lin, Chen-Hsuan and Wang, Oliver and Russell, Bryan C and Shechtman, Eli and Kim, Vladimir G and Fisher, Matthew and Lucey, Simon},
  booktitle={IEEE Conference on Computer Vision and Pattern Recognition ({CVPR})},
  year={2019}
}
We propose to combine multi-view geometric methods with data-driven shape priors for 3D mesh reconstruction from RGB video frames. We pose the photometric mesh optimization as a piece-wise 2D image alignment problem, which enables 3D meshes to deform and align against 2D video sequences without any depth or mask information.

ST-GAN: Spatial Transformer Generative Adversarial Networks for Image Compositing

Chen-Hsuan Lin, Ersin Yumer, Oliver Wang, Eli Shechtman, and Simon Lucey
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018
paper arXiv preprint project page code BibTex (show)
@inproceedings{lin2018stgan,
  title={ST-GAN: Spatial Transformer Generative Adversarial Networks for Image Compositing},
  author={Lin, Chen-Hsuan and Yumer, Ersin and Wang, Oliver and Shechtman, Eli and Lucey, Simon},
  booktitle={IEEE Conference on Computer Vision and Pattern Recognition ({CVPR})},
  year={2018}
}
We propose a novel GAN architecture leveraging Spatial Transformer Networks to predict geometric corrections on objects that creates realistic composite images. We demonstrate the efficacy of our method on various applications, where ground-truth supervision is unavailable and the geometric predictions are driven purely by appearance realism.

Deep-LK for Efficient Adaptive Object Tracking

Chaoyang Wang, Hamed Kiani Galoogahi, Chen-Hsuan Lin, and Simon Lucey
IEEE International Conference on Robotics and Automation (ICRA), 2018
paper arXiv preprint BibTex (show)
@inproceedings{wang2018deeplk,
  title={Deep-LK for Efficient Adaptive Object Tracking},
  author={Wang, Chaoyang and Galoogahi, Hamed Kiani and Lin, Chen-Hsuan and Lucey, Simon},
  booktitle={IEEE International Conference on Robotics and Automation ({ICRA})},
  year={2018}
}
We demonstrate a theoretical relationship between Siamese regression networks and the Lucas-Kanade algorithm for object tracking. To take the best of both worlds, we propose a novel framework for object tracking to learn the feature representation for adaptation of regression parameters online with respect to the tracked template images.

Learning Efficient Point Cloud Generation for Dense 3D Object Reconstruction

Chen-Hsuan Lin, Chen Kong, and Simon Lucey
AAAI Conference on Artificial Intelligence (AAAI), 2018 (oral presentation)
paper arXiv preprint project page code BibTex (show)
@inproceedings{lin2018learning,
  title={Learning Efficient Point Cloud Generation for Dense 3D Object Reconstruction},
  author={Lin, Chen-Hsuan and Kong, Chen and Lucey, Simon},
  booktitle={AAAI Conference on Artificial Intelligence ({AAAI})},
  year={2018}
}
We propose a simple generator network using 2D convolutional operations to efficiently reconstruct 3D shapes in the form of dense point clouds. During optimization, we jointly apply geometric reasoning with 2D projection using the pseudo-renderer, a novel differentiable module to approximate the true rendering operation of novel depth images.

Object-Centric Photometric Bundle Adjustment with Deep Shape Prior

Rui Zhu, Chaoyang Wang, Chen-Hsuan Lin, Ziyan Wang, and Simon Lucey
IEEE Winter Conference on Applications of Computer Vision (WACV), 2018
paper arXiv preprint BibTex (show)
@inproceedings{zhu2017object,
  title={Object-Centric Photometric Bundle Adjustment with Deep Shape Prior},
  author={Zhu, Rui and Wang, Chaoyang and Lin, Chen-Hsuan and Wang, Ziyan and Lucey, Simon},
  booktitle={IEEE Winter Conference on Applications of Computer Vision ({WACV})},
  year={2018}
}
We introduce learned shape priors in the form of deep neural networks into the Photometric Bundle Adjustment framework. We propose to accommodate 3D point clouds generated by the shape prior within the optimization-based inference framework, which allows adaptation of the generated shapes to the given video sequences.

Inverse Compositional Spatial Transformer Networks

Chen-Hsuan Lin and Simon Lucey
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017 (oral presentation)
paper arXiv preprint project page presentation code BibTex (show)
@inproceedings{lin2017inverse,
  title={Inverse Compositional Spatial Transformer Networks},
  author={Lin, Chen-Hsuan and Lucey, Simon},
  booktitle={IEEE Conference on Computer Vision and Pattern Recognition ({CVPR})},
  year={2017}
}
Inspired by the classical Lucas-Kanade algorithm for alignment, we improve upon Spatial Transformer Networks to jointly learn discriminative tasks and resolve large geometric redundancies through recurrent spatial transformations. Superior performance is demonstrated in various pure image alignment and joint alignment/classification tasks.

Using Locally Corresponding CAD Models for Dense 3D Reconstructions from a Single Image

Chen Kong, Chen-Hsuan Lin, and Simon Lucey
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017
paper BibTex (show)
@inproceedings{kong2017using,
  title={Using Locally Corresponding CAD Models for Dense 3D Reconstructions from a Single Image},
  author={Kong, Chen and Lin, Chen-Hsuan and Lucey, Simon},
  booktitle={IEEE Conference on Computer Vision and Pattern Recognition ({CVPR})},
  year={2017}
}
We propose to estimate the dense 3D shape of an object given a set of 2D landmarks and silhouette in a single image by choosing the closest single CAD model to the projected image. We propose a novel graph embedding based on local landmark correspondences to allow for sparse linear combinations of a CAD model dictionary.

The Conditional Lucas & Kanade Algorithm

Chen-Hsuan Lin, Rui Zhu, and Simon Lucey
European Conference on Computer Vision (ECCV), 2016
paper arXiv preprint project page code BibTex (show)
@inproceedings{lin2016conditional,
  title={The Conditional Lucas \& Kanade Algorithm},
  author={Lin, Chen-Hsuan and Zhu, Rui and Lucey, Simon},
  booktitle={European Conference on Computer Vision (ECCV)},
  pages={793--808},
  year={2016},
  organization={Springer International Publishing}
}
We propose an alignment algorithm inspired by the Lucas-Kanade algorithm and Supervised Descent Method by learning from a conditional objective function, achieving significant improvement learned from little training data. Superior performance is also achieved in applications including template tracking and facial landmark alignment.

Teaching

Visual Learning and Recognition (CMU 16-824), Spring 2019
Teaching Assistant & Lecturer (with Prof. Abhinav Gupta)
Computer Vision (CMU 16-720 A/B), Fall 2017
Head Teaching Assistant (with Prof. Srinivasa Narasimhan, Prof. Simon Lucey, Prof. Yaser Sheikh)
Designing Computer Vision Apps (CMU 16-423), Fall 2015
Teaching Assistant (with Prof. Simon Lucey)

Academic Projects

Towards a More Curious Agent

CMU 10-703 Deep Reinforcement Learning & Control
paper
We formulate an agent's intrinsic reward using the statistical distribution of visual observations with convolutional causality modeling for policy networks. Our algorithm can solve navigational problems with very sparse rewards in VizDoom.

3D Facial Model Fitting from 2D Videos

CMU CI2CV Lab
video (show)
We design a dense 3D reconstruction system of human faces in metric scale given self-recorded 2D videos. We achieve this by solving for the sparse 3D structure from sequential tracking of facial landmarks followed by dense mesh fitting.

Perceptual Rate-Distortion Optimization of Motion Estimation

NTU Multimedia Processing & Communications Lab
paper
We design an optimization framework in video compression to use perceptual metrics to find an optimal balance between the encoding bitrate and decoding image distortion, achieving 12.2% bitrate reduction over H.264/AVC encoders.

Disentangler Networks with Absolute and Relative Attributes

CMU 16-824 Visual Learning & Recognition
paper
We design a generative network to disentangle image embeddings into distinct and controllable attributes for image manipulation. Our network is able to learn from strong absolute supervision as well as relative pairwise rankings of attributes.

Video Summarization via Convolutional Neural Networks

CMU 10-701 Machine Learning
report
We propose a novel objective function for deep networks to learn latent representations for video summarization, which is achieved through K-means clustering of the input video frames in the representation space during test time.

Virtual Piano Keyboard System

NTU Digital Circuit Design Lab
presentation demo
We develop a sensor-based virtual instrumental system using only a paper keyboard. To emulate this, we design real-time fingertip detection and keyboard pattern recognition algorithms directly on raw images captured by CCD image sensors.



© designed by Chen-Hsuan Lin.