Dota is a large-scale dataset for object detection in aerial images. It can be used to develop and evaluate object detectors in aerial images. We will continue to update DOTA, to grow in size and scope and to reflect evolving real-world conditions. For the DOTA-v1.0, it contains 2806 aerial images from different sensors and platforms. Each image is of the size in the range from about 800 × 800 to 4000 × 4000 pixels and contains objects exhibiting a wide variety of scales, orientations, and shapes. These DOTA images are then annotated by experts in aerial image interpretation using 15 common object categories. The fully annotated DOTA images contains 188, 282 instances, each of which is labeled by an arbitrary (8 d.o.f.) quadrilateral.
The images of in DOTA-v1.0 dataset are manily collected from the Google
Earth, some are taken by satellite JL-1, the others are taken by
satellite GF-2 of the China Centre for Resources Satellite Data and
Application.
Use of the images from Google Earth must respect the corresponding terms
of use:
"Google Earth" terms of use.
In the dataset, each instance's location is annotated by a
quadrilateral bounding boxes, which can be denoted as "x1, y1, x2, y2,
x3, y3, x4, y4" where (xi, yi) denotes the positions of the oriented
bounding boxes' vertices in the image. The vertices are arranged in a
clockwise order. The following is the Visualization of adopted annotation
method. The yellow point represents the starting point. which refers to:
(a) top left corner of a plane, (b) top left corner of a large vehicle 、
diamond, (c) the center of sector-shaped baseball.
Except the annotation of location, category label is assigned for each
instance, which comes from one of the above 15 selected categories, and
meanwhile a difficult label is provided which indicates whether the
instance is difficult to be detected(1 for difficult, 0 for not difficult).
The object categories in DOTA-v1.0
include: plane, ship, storage tank, baseball diamond, tennis court,
basketball court, ground track field, harbor, bridge, large vehicle, small
vehicle, helicopter, roundabout, soccer ball field and swimming pool.
Annotations for an image are saved in
a text file with the same file name. At the first line, 'imagesource'(from
GoogleEarth, GF-2 or JL-1) is given. At the second line, ’gsd’(ground
sample distance, the physical size of one image pixel, in meters) is given.
Note if the 'gsd' is missing, it is annotated to be 'null'. From third
line to last line in annotation text file, annotation for each instance is
given. The annotation format is:
'imagesource':imagesource
'gsd':gsd
x1, y1, x2, y2, x3, y3, x4, y4, category, difficult
x1, y1, x2, y2, x3, y3, x4, y4, category, difficult
...
@InProceedings{Xia_2018_CVPR,
author = {Xia, Gui-Song and Bai, Xiang and Ding, Jian and Zhu, Zhen and Belongie, Serge and Luo, Jiebo and Datcu, Mihai and Pelillo, Marcello and Zhang, Liangpei},
title = {DOTA: A Large-Scale Dataset for Object Detection in Aerial Images},
booktitle = {The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2018}
}
@InProceedings{Ding_2019_CVPR,
author = {Jian Ding, Nan Xue, Yang Long, Gui-Song Xia, Qikai Lu},
title = {Learning RoI Transformer for Detecting Oriented Objects in Aerial Images},
booktitle = {The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2019}