Our dataset consist of manually contour-labeled pedestrian images
captured from a vehicle-mounted calibrated stereo
camera rig in an urban environment. For each pedestrian
cutout we provide a 24 bit PNG image, a float disparity map
and a ground truth shape.
Dense stereo is computed using the semi-global matching
algorithm (H. Hirschmueller, Stereo processing by semi-global
matching and mutual information,
IEEE Trans. on PAMI, 30(2)
:328-341, 2008).
The 785 image cut-outs have a height between 34 and 468 pixels and a width between 11 and 267 pixels. In our BMVC’13 publication only samples with a height greater than 120 pixels are used. We provide the samples with an additional 10 % border to each side.