Chen-Hsuan Lin

I am a 2nd-year Ph.D. student in the Robotics Institute at Carnegie Mellon University, working with Prof. Simon Lucey on computer vision & deep learning research. Before that, I completed my M.S. in Robotics at CMU and B.S. in Electrical Engineering at National Taiwan University.

My current research is focused on 3D vision problems, specifically using spatial alignment and generative modeling techniques. I'm also interested in deeper understanding of neural networks and exploiting geometric knowledge as to improve learning efficiency.

I am a recipient of the NVIDIA Graduate Fellowship. During internships, I have also collaborated with Eli Shechtman, Oliver Wang, Ersin Yumer, Bryan Russell, Matthew Fisher, and Vova Kim at Adobe Research.

CV (last updated: 12/2018) GitHub

Contact email: chlin (at) cmu (dot) edu

Updates

12/2018 I received the NVIDIA Graduate Fellowship (list here). Thank you NVIDIA!
02/2018 I have one paper accepted to CVPR 2018!
01/2018 I have one paper accepted to ICRA 2018.
11/2017 I have two papers accepted to AAAI 2018 (as an oral) and WACV 2018.
07/2017 My oral presentation at CVPR 2017 is online here.
02/2017 I have two papers accepted to CVPR 2017 (with one oral)!

Research

ST-GAN: Spatial Transformer Generative Adversarial Networks for Image Compositing

Chen-Hsuan Lin, Ersin Yumer, Oliver Wang, Eli Shechtman, and Simon Lucey
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018
paper | arXiv preprint | website | code | BibTex (show)
@inproceedings{lin2018stgan,
  title={ST-GAN: Spatial Transformer Generative Adversarial Networks for Image Compositing},
  author={Lin, Chen-Hsuan and Yumer, Ersin and Wang, Oliver and Shechtman, Eli and Lucey, Simon},
  booktitle={IEEE Conference on Computer Vision and Pattern Recognition ({CVPR})},
  year={2018}
}
We propose a novel GAN architecture leveraging Spatial Transformer Networks to predict geometric corrections on objects that creates realistic composite images. We demonstrate the efficacy of our method on various applications, where ground-truth supervision is unavailable and the geometric predictions are driven purely by appearance realism.

Deep-LK for Efficient Adaptive Object Tracking

Chaoyang Wang, Hamed Kiani Galoogahi, Chen-Hsuan Lin, and Simon Lucey
IEEE International Conference on Robotics and Automation (ICRA), 2018
paper | arXiv preprint | BibTex (show)
@inproceedings{wang2018deeplk,
  title={Deep-LK for Efficient Adaptive Object Tracking},
  author={Wang, Chaoyang and Galoogahi, Hamed Kiani and Lin, Chen-Hsuan and Lucey, Simon},
  booktitle={IEEE International Conference on Robotics and Automation ({ICRA})},
  year={2018}
}
We demonstrate a theoretical relationship between Siamese regression networks and the Lucas-Kanade algorithm for object tracking. To take the best of both worlds, we propose a novel framework for object tracking to learn the feature representation for adaptation of regression parameters online with respect to the tracked template images.

Learning Efficient Point Cloud Generation for Dense 3D Object Reconstruction

Chen-Hsuan Lin, Chen Kong, and Simon Lucey
AAAI Conference on Artificial Intelligence (AAAI), 2018 (oral presentation)
paper | arXiv preprint | website | code | BibTex (show)
@inproceedings{lin2018learning,
  title={Learning Efficient Point Cloud Generation for Dense 3D Object Reconstruction},
  author={Lin, Chen-Hsuan and Kong, Chen and Lucey, Simon},
  booktitle={AAAI Conference on Artificial Intelligence ({AAAI})},
  year={2018}
}
We propose a simple generator network using 2D convolutional operations to efficiently reconstruct 3D object shapes in the form of dense point clouds. During optimization, we jointly apply geometric reasoning with 2D projection using the pseudo-renderer, a novel differentiable module to approximate the true rendering operation of novel depth images.

Object-Centric Photometric Bundle Adjustment with Deep Shape Prior

Rui Zhu, Chaoyang Wang, Chen-Hsuan Lin, Ziyan Wang, and Simon Lucey
IEEE Winter Conference on Applications of Computer Vision (WACV), 2018
paper | arXiv preprint | BibTex (show)
@inproceedings{zhu2017object,
  title={Object-Centric Photometric Bundle Adjustment with Deep Shape Prior},
  author={Zhu, Rui and Wang, Chaoyang and Lin, Chen-Hsuan and Wang, Ziyan and Lucey, Simon},
  booktitle={IEEE Winter Conference on Applications of Computer Vision ({WACV})},
  year={2018}
}
We introduce learned shape priors in the form of deep neural networks into the Photometric Bundle Adjustment (PBA) framework. We propose to accommodate 3D point clouds generated by the shape prior within the optimization-based inference framework, which allows adaptation of the generated shapes to the given video sequences.

Inverse Compositional Spatial Transformer Networks

Chen-Hsuan Lin and Simon Lucey
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017 (oral presentation)
paper | arXiv preprint | website | presentation | code | BibTex (show)
@inproceedings{lin2017inverse,
  title={Inverse Compositional Spatial Transformer Networks},
  author={Lin, Chen-Hsuan and Lucey, Simon},
  booktitle={IEEE Conference on Computer Vision and Pattern Recognition ({CVPR})},
  year={2017}
}
Inspired by the classical Lucas-Kanade algorithm for alignment, we improve upon Spatial Transformer Networks to jointly learn discriminative tasks and resolve large geometric redundancies through recurrent spatial transformations. Superior performance is demonstrated in various pure image alignment and joint alignment/classification tasks.

Using Locally Corresponding CAD Models for Dense 3D Reconstructions from a Single Image

Chen Kong, Chen-Hsuan Lin, and Simon Lucey
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017
paper | BibTex (show)
@inproceedings{kong2017using,
  title={Using Locally Corresponding CAD Models for Dense 3D Reconstructions from a Single Image},
  author={Kong, Chen and Lin, Chen-Hsuan and Lucey, Simon},
  booktitle={IEEE Conference on Computer Vision and Pattern Recognition ({CVPR})},
  year={2017}
}
We estimate the dense 3D shape of an object given a set of 2D landmarks and silhouette in a single image by choosing the closest single CAD model to the projected image. We propose a novel graph embedding based on local landmark correspondences to allow for sparse linear combinations of a CAD model dictionary.

The Conditional Lucas & Kanade Algorithm

Chen-Hsuan Lin, Rui Zhu, and Simon Lucey
European Conference on Computer Vision (ECCV), 2016
paper | arXiv preprint | website | code | BibTex (show)
@inproceedings{lin2016conditional,
  title={The Conditional Lucas \& Kanade Algorithm},
  author={Lin, Chen-Hsuan and Zhu, Rui and Lucey, Simon},
  booktitle={European Conference on Computer Vision (ECCV)},
  pages={793--808},
  year={2016},
  organization={Springer International Publishing}
}
We propose an alignment algorithm inspired by the Lucas-Kanade algorithm and Supervised Descent Method by learning from a conditional objective function, achieving significant improvement learned from little training data. Superior performance is also achieved in applications including template tracking and facial landmark alignment.

Teaching

Teaching Assistant

Visual Learning and Recognition (CMU 16-824), Spring 2019
Instructor: Prof. Abhinav Gupta

Head Teaching Assistant

Computer Vision (CMU 16-720 A/B), Fall 2017
Instructor: Prof. Srinivasa Narasimhan, Prof. Simon Lucey, Prof. Yaser Sheikh

Teaching Assistant

Designing Computer Vision Apps (CMU 16-423), Fall 2015
Instructor: Prof. Simon Lucey

Academic Projects

Towards a More Curious Agent

CMU 10-703 Deep Reinforcement Learning & Control
paper
We formulate an agent's curiosity using the distribution on observations with a scheme to model causality for policy networks. Our algorithm can solve navigational problems with very sparse rewards in VizDoom.

3D Facial Model Fitting from 2D Videos

CMU CI2CV Lab
video (show)
We design a 3D reconstruction system of faces in metric scale given self-recorded 2D videos. We achieve this by solving for the 3D structure from facial landmark tracking with a bundle adjustment framework.

Perceptual Rate-Distortion Optimization of Motion Estimation

NTU Multimedia Processing & Communications Lab
paper
We design an optimization framework in video compression to balance between the encoding bitrate and decoding distortion using perceptual metrics, achieving 12.2% bitrate reduction over H.264/AVC encoders.

Disentangler Networks with Absolute and Relative Attributes

CMU 16-824 Visual Learning & Recognition
paper
We design a deep network to disentangle image embeddings into distinct, controllable attributes for conditional image generation. Our network can learn from both absolute supervision and relative pairwise rankings.

Video Summarization via Convolutional Neural Networks

CMU 10-701 Machine Learning
report
We propose a novel loss function for deep networks to learn image representations for video summarization. This can be achieved through K-means clustering of video frames in the feature space during test time.

Virtual Piano Keyboard System

NTU Digital Circuit Design Lab
presentation | demo
We develop a virtual touch instrumental system using only a paper keyboard. We design a real-time fingertip detection and keyboard pattern recognition algorithms on raw images captured by CCD sensors.



© designed by Chen-Hsuan Lin.