We have published the Audi Autonomous Driving Dataset (A2D2) to support startups and academic researchers working on autonomous driving. Equipping a vehicle with a multimodal sensor suite, recording a large dataset, and labelling it, is time and labour intensive. Our dataset removes this high entry barrier and frees researchers and developers to focus on developing new technologies instead. The dataset features 2D semantic segmentation, 3D point clouds, 3D bounding boxes, and vehicle bus data.
Our sensor suite consists of six cameras, five LiDAR sensors, and an automotive gateway for recording bus data. This configuration provides 360° coverage of the environment with camera and LiDAR. The bus data give information about vehicle state and driver control input.
Overview of sensor
carrier with sensors (top view)
Our vehicle is equipped with additional hardware for recording data from the sensor suite and vehicle bus. The cameras are connected to an embedded computer via LVDS, while the LiDAR sensors are connected via a 1G-Ethernet switch. Each LiDAR sensor is connected to a GNSS receiver which acts as a clock. A further GNSS clock serves as a time master for the gateway and embedded computer. The bus gateway connects to the embedded computer via 1G-Ethernet. All data is stored on a crash-safe network storage device, equipped with 48 TB of SSD storage, and accessed via 10G-Ethernet.
Overview of the
recording hardware and its setup
All sensor signals are timestamped in UTC format. Camera images are timestamped when they arrive at the embedded computer, which is synchronised to the time master. Bus data are timestamped at the gateway, which is also synchronised to the time master. LiDAR signals are timestamped at the sensors, which get their time from GNSS.
The LiDAR sensor pose relative to the vehicle is determined by direct measurement of positions and orientation when mounted on the vehicle.
Camera poses with respect to the vehicle are determined by direct in-situ measurements of position and orientation.
We use one LiDAR as a reference and initialise the other LiDAR sensor poses to their measured positions and orientations. Next, an Iterative Closest Point algorithm is used to refine the poses of the other LiDAR sensors within the vehicle coordinate system. This registration uses a recording of a static environment with a static ego vehicle and does not require any fiducial targets.
The camera poses are optimized using camera and LiDAR recordings of fiducial targets (e.g. checkerboards). Additionally a low speed driving scene is used to improve calibration of sensor orientation. This process uses features (e.g. edges) in camera and LiDAR data to optimize relative poses.
The dataset features 41,280 frames with semantic segmentation in 38 categories. Each pixel in an image is given a label describing the type of object it represents, e.g. pedestrian, car, vegetation, etc.
Point cloud segmentation is produced by fusing semantic pixel information and LiDAR point clouds. Each 3D point is thereby assigned an object type label. This relies on accurate camera-LiDAR registration.
c3D bounding boxes are provided for 12,499 frames. LiDAR points within the field of view of the front camera are labelled with 3D bounding boxes. We annotate 14 classes relevant to driving, e.g. cars, pedestrians, buses, etc.
Please use the following citation when referencing the dataset:
@article{geyer2020a2d2,
title={A2d2: Audi autonomous driving dataset},
author={Geyer, Jakob and Kassahun, Yohannes and Mahmudi, Mentar and Ricou, Xavier and Durgesh,
Rupesh and Chung, Andrew S and Hauswald, Lorenz and Pham, Viet Hoang and M{\"u}hlegg, Maximilian
and Dorn, Sebastian and others},
journal={arXiv preprint arXiv:2004.06320},
year={2020}
}