The nuScenes dataset is a large-scale autonomous driving dataset with 3d object annotations. It features:
● Full sensor suite (1x LIDAR, 5x RADAR, 6x camera, IMU, GPS)
● 1000 scenes of 20s each
● 1,400,000 camera images
● 390,000 lidar sweeps
● Two diverse cities: Boston and Singapore
● Left versus right hand traffic
● Detailed map information
● 1.4M 3D bounding boxes manually annotated for 23 object classes
● Attributes such as visibility, activity and pose
● New: 1.1B lidar points manually annotated for 32 classes
● New: Explore nuScenes on SiaSearch
● Free to use for non-commercial use
● For a commercial license contact nuScenes@motional.com
For the nuScenes dataset we collect approximately 15h of driving data in Boston and Singapore. For the full nuScenes dataset, we publish data from Boston Seaport and Singapore’s One North, Queenstown and Holland Village districts. Driving routes are carefully chosen to capture challenging scenarios. We aim for a diverse set of locations, times and weather conditions. To balance the class frequency distribution, we include more scenes with rare classes (such as bicycles). Using these criteria, we manually select 1000 scenes of 20s duration each. These scenes are carefully annotated using human experts. The annotator instructions can be found in the devkit repository.
We use two Renault Zoe cars with an identical sensor layout to drive in Boston and Singapore. The data was gathered from a research platform and is not indicative of the setup used in Motional products. Please refer to the above figure for the placement of the sensors. We release data from the following sensors:
To achieve a high quality multi-sensor dataset, it is essential to calibrate the extrinsics and intrinsics of every sensor. We express extrinsic coordinates relative to the ego frame, i.e. the midpoint of the rear vehicle axle. The most relevant steps are described below:
LIDAR extrinsics:
We use a laser liner to accurately measure the relative location of the LIDAR to the ego frame.
Camera extrinsics:
We place a cube-shaped calibration target in front of the camera and LIDAR sensors. The calibration target consists of three orthogonal planes with known patterns. After detecting the patterns we compute the transformation matrix from camera to LIDAR by aligning the planes of the calibration target. Given the LIDAR to ego frame transformation computed above, we can then compute the camera to ego frame transformation and the resulting extrinsic parameters.
RADAR extrinsics
We mount the radar in a horizontal position. Then we collect radar measurements by driving in an urban environment. After filtering radar returns for moving objects, we calibrate the yaw angle using a brute force approach to minimize the compensated range rates for static objects.
Camera intrinsic calibration
We use a calibration target board with a known set of patterns to infer the intrinsic and distortion parameters of the camera.
In order to achieve good cross-modality data alignment between the LIDAR and the cameras, the exposure of a camera is triggered when the top LIDAR sweeps across the center of the camera’s FOV. The timestamp of the image is the exposure trigger time; and the timestamp of the LIDAR scan is the time when the full rotation of the current LIDAR frame is achieved. Given that the camera’s exposure time is nearly instantaneous, this method generally yields good data alignment. Note that the cameras run at 12Hz while the LIDAR runs at 20Hz. The 12 camera exposures are spread as evenly as possible across the 20 LIDAR scans, so not all LIDAR scans have a corresponding camera frame. Reducing the frame rate of the cameras to 12Hz helps to reduce the compute, bandwidth and storage requirement of the perception system.
It is our priority to protect the privacy of third parties. For this purpose we use state-of-the-art object detection techniques to detect license plates and faces. We aim for a high recall and remove false positives that do not overlap with the reprojections of the known person and car boxes. Eventually we use the output of the object detectors to blur faces and license plates in the images of nuScenes.
This document describes the database schema used in nuScenes. All annotations and meta data
(including calibration, maps, vehicle coordinates etc.) are covered in a relational database.
The database tables are listed below. Every row can be identified by its unique primary key
token
. Foreign keys such as sample_token
may be used to link to the token
of the table
sample
. Please refer to the tutorial for an
introduction to the most important database tables.
An attribute is a property of an instance that can change while the category remains the same. Example: a vehicle being parked/stopped/moving, and whether or not a bicycle has a rider.
attribute {
"token": <str> -- Unique record identifier.
"name": <str> -- Attribute name.
"description": <str> -- Attribute description.
}
Definition of a particular sensor (lidar/radar/camera) as calibrated on a particular vehicle. All extrinsic parameters are given with respect to the ego vehicle body frame. All camera images come undistorted and rectified.
calibrated_sensor {
"token": <str> -- Unique record identifier.
"sensor_token": <str> -- Foreign key pointing to the sensor type.
"translation": <float> [3] -- Coordinate system origin in meters: x, y, z.
"rotation":
<float> [4] -- Coordinate system orientation as quaternion: w, x, y, z.
"camera_intrinsic":
<float> [3, 3] -- Intrinsic camera calibration. Empty for sensors that are not cameras.
}
Taxonomy of object categories (e.g. vehicle, human). Subcategories are delineated by a period
(e.g. human.pedestrian.adult
).
category {
"token": <str> -- Unique record identifier.
"name": <str> -- Category name. Subcategories indicated by period.
"description": <str> -- Category description.
"index": <int> -- The index of the label
used for efficiency reasons in the .bin label files of nuScenes-lidarseg. This field did not
exist previously.
}
Ego vehicle pose at a particular timestamp. Given with respect to global coordinate system of the log's map. The ego_pose is the output of a lidar map-based localization algorithm described in our paper. The localization is 2-dimensional in the x-y plane.
ego_pose {
"token": <str> -- Unique record identifier.
"translation":
<float> [3] -- Coordinate system origin in meters: x, y, z. Note that z is always
0.
"rotation": <float> [4] -- Coordinate system orientation as quaternion:
w, x, y, z.
"timestamp": <int> -- Unix time stamp.
}
An object instance, e.g. particular vehicle. This table is an enumeration of all object instances we observed. Note that instances are not tracked across scenes.
instance {
"token": <str> -- Unique record identifier.
"category_token": <str> -- Foreign key pointing to the object category.
"nbr_annotations": <int> -- Number of annotations of this instance.
"first_annotation_token":
<str> -- Foreign key. Points to the first annotation of this instance.
"last_annotation_token":
<str> -- Foreign key. Points to the last annotation of this instance.
}
Mapping between nuScenes-lidarseg annotations and sample_datas corresponding to the lidar pointcloud associated with a keyframe.
lidarseg {
"token": <str> -- Unique record identifier.
"filename": <str> -- The name of the .bin files containing the nuScenes-lidarseg
labels. These are numpy arrays of uint8 stored in binary format using numpy.
"sample_data_token":
<str> -- Foreign key. Sample_data corresponding to the annotated lidar pointcloud with
is_key_frame=True.
}
Information about the log from which the data was extracted.
log {
"token": <str> -- Unique record identifier.
"logfile": <str> -- Log file name.
"vehicle": <str> -- Vehicle name.
"date_captured": <str> -- Date (YYYY-MM-DD).
"location": <str> -- Area where log was captured, e.g. singapore-onenorth.
}
Map data that is stored as binary semantic masks from a top-down view.
map {
"token": <str> -- Unique record identifier.
"log_tokens": <str> [n] -- Foreign keys.
"category": <str> -- Map category, currently only semantic_prior
for drivable surface and sidewalk.
"filename": <str> -- Relative path to the file with the map mask.
}
A sample is an annotated keyframe at 2 Hz. The data is collected at (approximately) the same timestamp as part of a single LIDAR sweep.
sample {
"token": <str> -- Unique record identifier.
"timestamp": <int> -- Unix time stamp.
"scene_token": <str> -- Foreign key pointing to the scene.
"next":
<str> -- Foreign key. Sample that follows this in time. Empty if end of
scene.
"prev": <str> -- Foreign key. Sample that precedes this in time.
Empty if start of scene.
}
A bounding box defining the position of an object seen in a sample. All location data is given with respect to the global coordinate system.
sample_annotation {
"token": <str> -- Unique record identifier.
"sample_token":
<str> -- Foreign key. NOTE: this points to a sample NOT a sample_data since annotations
are done on the sample level taking all relevant sample_data into account.
"instance_token":
<str> -- Foreign key. Which object instance is this annotating. An instance can have
multiple annotations over time.
"attribute_tokens": <str> [n] -- Foreign keys. List
of attributes for this annotation. Attributes can change over time, so they belong here, not
in the instance table.
"visibility_token": <str> -- Foreign key. Visibility may
also change over time. If no visibility is annotated, the token is an empty string.
"translation":
<float> [3] -- Bounding box location in meters as center_x, center_y, center_z.
"size": <float> [3] -- Bounding box size in meters as width, length, height.
"rotation": <float> [4] -- Bounding box orientation as quaternion: w, x, y, z.
"num_lidar_pts": <int> -- Number of lidar points in this box. Points are counted
during the lidar sweep identified with this sample.
"num_radar_pts": <int> --
Number of radar points in this box. Points are counted during the radar sweep identified with
this sample. This number is summed across all radar sensors without any invalid point filtering.
"next": <str> -- Foreign key. Sample annotation from the same object
instance that follows this in time. Empty if this is the last annotation for this object.
"prev": <str> -- Foreign key. Sample annotation from the same object
instance that precedes this in time. Empty if this is the first annotation for this object.
}
A sensor data e.g. image, point cloud or radar return. For sample_data with is_key_frame=True, the time-stamps should be very close to the sample it points to. For non key-frames the sample_data points to the sample that follows closest in time.
sample_data {
"token": <str> -- Unique record identifier.
"sample_token": <str> --
Foreign key. Sample to which this sample_data is associated.
"ego_pose_token": <str> -- Foreign key.
"calibrated_sensor_token": <str> -- Foreign key.
"filename": <str> -- Relative path to data-blob on disk.
"fileformat": <str> -- Data file format.
"width":
<int> -- If the sample data is an image, this is the image width in pixels.
"height":
<int> -- If the sample data is an image, this is the image height in pixels.
"timestamp": <int> -- Unix time stamp.
"is_key_frame": <bool> -- True if sample_data is part of key_frame, else False.
"next": <str> -- Foreign key. Sample data from the same sensor that follows
this in time. Empty if end of scene.
"prev": <str> -- Foreign key. Sample
data from the same sensor that precedes this in time. Empty if start of scene.
}
A scene is a 20s long sequence of consecutive frames extracted from a log. Multiple scenes can come from the same log. Note that object identities (instance tokens) are not preserved across scenes.
scene {
"token": <str> -- Unique record identifier.
"name": <str> -- Short string identifier.
"description": <str> -- Longer description of the scene.
"log_token": <str> -- Foreign key. Points to log from where the data was extracted.
"nbr_samples": <int> -- Number of samples in this scene.
"first_sample_token": <str> -- Foreign key. Points to the first sample in scene.
"last_sample_token": <str> -- Foreign key. Points to the last sample in scene.
}
A specific sensor type.
sensor {
"token": <str> -- Unique record identifier.
"channel": <str> -- Sensor channel name.
"modality": <str> {camera, lidar, radar} -- Sensor modality. Supports category(ies)
in brackets.
}
The visibility of an instance is the fraction of annotation visible in all 6 images. Binned into 4 bins 0-40%, 40-60%, 60-80% and 80-100%.
visibility {
"token": <str> -- Unique record identifier.
"level": <str> -- Visibility level.
"description": <str> -- Description of visibility level.
}
We provide a number of tutorials for nuScenes as interactive Jupyter Notebooks in the devkit.
The tutorials are shown here as static pages for users that do not want to download the dataset.
These tutorials cover the basic usage of nuScenes, nuScenes-lidarseg, the map and CAN bus expansions,
as well as the prediction challenge. Use the dropdown menu below to select the tutorial you
want to view.
Alternatively, you can run the tutorials interactively on Colab:
In the first nuScenes release, bounding boxes or cuboids are used to represent 3D objects. While useful in many cases, cuboids lack the ability to capture fine shape details of articulated objects. nuScenes-lidarseg, which stands for lidar semantic segmentation, has higher levels of granularity by containing annotations for every single lidar point in the 40,000 keyframes of the nuScenes dataset with a semantic label – an astonishing 1,400,000,000 lidar points annotated with one of 32 labels. In addition to the 23 foreground classes (things) from nuScenes, we have included 9 background classes (stuff). For a detailed definition of every class and example images, please see the annotator instructions for nuScenes and nuScenes-lidarseg.
The taxonomy of nuScenes-lidarseg is compatible with the rest of nuScenes and nuImages, thus enabling a wide range of research across multiple sensor modalities. This is a major step forward for industry and academia alike, as it allows researchers to study and quantify novel problems such as lidar point cloud segmentation, foreground extraction, sensor calibration and mapping using point-level semantics. In the future, we plan to organize various public challenges around these tasks.
nuScenes-lidarseg is standing on the shoulders of giants. The academic SemanticKITTI dataset annotates the famous KITTI dataset with lidar segmentation labels for 28 classes. KITTI primarily consists of suburban streets with low traffic density and less challenging traffic situations. Its annotations only cover the front camera, rather than the entire 360 degree view. Furthermore it does not contain radar and is strictly for non-commercial use. nuScenes set out to improve on these aspects, featuring dense data from urban and suburban scenes in Singapore and Boston. It is a multimodal dataset that covers the entire 360 degree view and can be used by commercial entities. Following the initial announcement of nuScenes-lidarseg in October 2019, we have seen a number of other lidar segmentation datasets emerge, such as Hesai's Pandaset and we are looking forward to more companies sharing their data with the community.
Just like nuScenes, the nuScenes-lidarseg annotations are available as free to use strictly for non-commercial purposes. Non-commercial means not primarily intended for or directed towards commercial advantage or monetary compensation. Examples of non-commercial use include but are not limited to personal use, educational use, such as in schools, academies, universities etc., and some research use. If you intend to use the nuScenes dataset for commercial purposes, we encourage you to contact us for commercial licensing options by sending an email to nuScenes@motional.com.
We hope that this dataset will allow researchers across the world to go even further in the quest to develop safe autonomous driving technology.
Please use the following citation when referencing the dataset:
@ARTICLE{nuscenes2019,
title={nuScenes: A multimodal dataset for autonomous driving},
author={Holger Caesar and Varun Bankiti and Alex H. Lang and Sourabh Vora and
Venice Erin Liong and Qiang Xu and Anush Krishnan and Yu Pan and
Giancarlo Baldan and Oscar Beijbom},
journal={arXiv preprint arXiv:1903.11027},
year={2019}
}