We collected a 5003 image dataset automatically from popular Hollywood movies. The images were obtained by running a state-of-the-art person detector on every tenth frame of 30 movies. People detected with high confidence (roughly 20K candidates) were then sent to the crowdsourcing marketplace Amazon Mechanical Turk to obtain groundtruthlabeling. Each image was annotated by five Turkers for $0.01 each to label 10 upperbody joints. The median-of-five labeling was taken in each image to be robust to outlier annotation. Finally, images were rejected manually by us if the person was occluded or severely non-frontal. We set aside 20% (1016 images) of the data for testing.
File | Size | Description |
---|---|---|
FLIC.zip | 287MB | 5003 examples used in our CVPR13 MODEC paper. |
FLIC-full.zip | 1.2GB | 20928 examples, a superset of FLIC consisting of more difficult examples (see below). NOTE: please do not use this as training data if testing on the FLIC test set. It is a superset of the original FLIC dataset and will lead to overfitting. Choose a sensible split where no two frames from the same movie shot cross the train/test divide. |
Please use the following citation when referencing the dataset:
@inproceedings{modec13,
title={MODEC: Multimodal Decomposable Models for Human Pose Estimation},
author={Sapp, Benjamin and Taskar, Ben},
booktitle={In Proc. CVPR},
year={2013},
}