We present a fully data-driven method to compute depth from diverse monocular video sequences that contain large amounts of non-rigid objects, e.g., people. To learn reconstruction cues for non-rigid scenes, we introduce a new dataset (WSVD) consisting of stereo videos scraped from Youtube. This dataset has a wide variety of scene types, and features many nonrigid objects.
Please use the following citation when referencing the dataset:
@misc{wang2019web,
title={Web Stereo Video Supervision for Depth Prediction from Dynamic Scenes},
author={Chaoyang Wang and Simon Lucey and Federico Perazzi and Oliver Wang},
year={2019},
eprint={1904.11112},
archivePrefix={arXiv},
primaryClass={cs.CV}
}