The Out the Window (OTW) dataset is a crowdsourced activity dataset containing 5,668 instances of 17 activities from the NIST Activities in Extended Video (ActEV) challenge. These videos are crowdsourced from workers on the Amazon Mechanical Turk using a novel scenario acting strategy, which collects multiple instances of natural activities per scenario. Turkers are instructed to lean their mobile device against an upper story window overlooking an outdoor space, walk outside to perform a scenario involving people, vehicles and objects, and finally upload the video to us for annotation. Performance evaluation for activity classification on VIRAT Ground 2.0 shows that the OTW dataset provides an 8.3% improvement in mean classification accuracy, and a 12.5% improvement on the most challenging activities involving people with vehicles.
Annotation files are CSV format with schema:
[Video ID, Activity ID, Actor ID, Activity or Object Type, Frame Number, XMin, YMin, XMax, YMax, Labeled]
Example annotations for a single activity from ./homes/annotations.csv:
00000,0,00038,dismounting bike,252,82,1165,255,1586,True
00000,0,00038,person,252,85,1165,211,1446,True
00000,0,00038,bicycle,253,103,1230,250,1458,False
00000,0,00038,dismounting bike,253,85,1165,250,1458,False
00000,0,00038,person,253,85,1165,211,1446,False
00000,0,00038,bicycle,254,103,1229,254,1455,False
00000,0,00038,dismounting bike,254,85,1165,254,1455,False
00000,0,00038,person,254,85,1165,211,1446,False
00000,0,00038,bicycle,255,102,1226,255,1449,False
00000,0,00038,dismounting bike,255,85,1165,255,1449,False
00000,0,00038,person,255,85,1165,211,1446,False
Frames are 0-indexed. Given an Activity Directory (homes or lots) and a Frame Number, the python snippet for the absolute path of an frame image filename is:
img_file_name = os.path.join(".", "homes", "%08d.jpg" % framenum)
d_otw_to_diva.json is a JSON dictionary that maps OTW label string to their equivalent DIVA label string. For example:
OTW : DIVA
"carrying (large)":"transport_heavycarry",
"pushing cart":"pull"
A python3.x script for extracting frames from either the homes or lots datasets.
Usage:
pip3 install imageio imageio-ffmpeg
python3 extract_frames.py homes 10 5
python3 extract_frames.py lots
This export will take a while, and will extract frames to ./homes/frames. This will parallelize the extraction over 10 workers (optionally can be increased or decreased) with the PNG compression level of 5 (1=worst, 9=best).
Please use the following citation when referencing the dataset:
@article{Castan2019OutTW,
title={Out the Window: A Crowd-Sourced Dataset for Activity Classification in Surveillance
Video},
author={Greg Casta{\~n}{\'o}n and Nathan Shnidman and T. Anderson and J. Byrne},
journal={ArXiv},
year={2019},
volume={abs/1908.10899}
}