graviti
PlatformMarketplaceSolutionsResourcesOpen DatasetsCommunityCompany
475
1
1
Free Spoken Digit
General
Discussion
Notebook
SDK
Activities
update dataset overview and ba...
69323d3·
Feb 10, 2022 7:47 AM
·3Commits

Overview

A simple audio/speech dataset consisting of recordings of spoken digits in wav files at 8kHz. The recordings are trimmed so that they have near minimal silence at the beginnings and ends.
FSDD is an open dataset, which means it will grow over time as data is contributed. In order to enable reproducibility and accurate citation the dataset is versioned using Zenodo DOI as well as git tags.

Data Collection

FSDD is an open dataset, which means it will grow over time as data is contributed. In order to enable reproducibility and accurate citation the dataset is versioned using Zenodo DOI as well as git tags.
Please contribute your homemade recordings. All recordings should be mono 8kHz wav files and be trimmed to have minimal silence. Don't forget to update metadata.py with the speaker meta-data.
To add your data, follow the recording instructions in acquire_data/say_numbers_prompt.py and then run split_and_label_numbers.py to make your files.

Data Format

Files are named in the following format: {digitLabel}{speakerName} {index}.wav Example: 7_jackson_32.wav
Now it contains 3,000 recordings (50 of each digit per speaker) from 6 speaks in English Prounciations.
metadata.py contains meta-data regarding the speakers gender and accents.

Data Preview
List Dataset Files
🎉Many thanks to Graviti Open Datasets for contributing the dataset
Basic Information
Application ScenariosNot Available
AnnotationsClassification
TasksVoice Print RecognitionASR
LicenseCC BY-SA 4.0
Updated on2022-02-11 08:13:58
Metadata
Data TypeAudio
Data Volume6,000
Annotation Amount6,000
File Size20.15MB
Copyright Owner
Zohar Jackson
Annotator
Unknown