Abstract

Dense 3D object reconstruction from a single image has recently witnessed remarkable advances, but supervising neural networks with ground-truth 3D shapes is impractical due to the laborious process of creating paired image-shape datasets. Recent efforts have turned to learning 3D reconstruction without 3D supervision from RGB images with annotated 2D silhouettes, dramatically reducing the cost and effort of annotation. These techniques, however, remain impractical as they still require multi-view annotations of the same object instance during training. As a result, most experimental efforts to date have been limited to synthetic datasets.

In this paper, we address this issue and propose SDF-SRN, an approach that requires only a single view of objects at training time, offering greater utility for real-world scenarios. SDF-SRN learns implicit 3D shape representations to handle arbitrary shape topologies that may exist in the datasets. To this end, we derive a novel differentiable rendering formulation for learning signed distance functions (SDF) from 2D silhouettes. Our method outperforms the state of the art under challenging single-view supervision settings on both synthetic and real-world datasets.

Video

Short talk

Code, dataset, and pretrained models

The code is hosted on GitHub (PyTorch).
Details to download the datasets and pretrained models are also described in the GitHub page.

Publications

NeurIPS 2020 paper: [ link ]

Supplementary material: [ link ]

arXiv preprint: https://arxiv.org/abs/2010.10505

BibTex:

@inproceedings{lin2020sdfsrn,
  title={SDF-SRN: Learning Signed Distance 3D Object Reconstruction from Static Images},
  author={Lin, Chen-Hsuan and Wang, Chaoyang and Lucey, Simon},
  booktitle={Advances in Neural Information Processing Systems ({NeurIPS})},
  year={2020}
}

Poster

(Click image to enlarge)

Acknowledgements

We thank Deva Ramanan, Abhinav Gupta, Andrea Vedaldi, Ioannis Gkioulekas, Wei-Chiu Ma, Ming-Fang Chang, Yufei Ye, Stanislav Panev, Rahul Venkatesh, and the reviewers for helpful discussions and feedback on the paper. Chen-Hsuan Lin is supported by the NVIDIA Graduate Fellowship. This work was supported by the CMU Argo AI Center for Autonomous Vehicle Research.