A simple audio/speech dataset consisting of recordings of spoken
digits in wav files at 8kHz. The recordings are
trimmed so that they have near
minimal silence at the beginnings and ends.
FSDD is an open dataset, which means it will grow over time
as data is contributed. In order to enable reproducibility and accurate
citation the dataset is versioned using Zenodo
DOI as well as git tags.
FSDD is an open dataset, which means it will grow over time as data is
contributed. In order to enable reproducibility
and accurate citation the dataset is versioned using Zenodo DOI as well as git tags.
Please contribute your homemade recordings. All recordings should be
mono 8kHz wav files and be trimmed to have minimal
silence. Don't forget to update metadata.py with the speaker meta-data.
To add your data, follow the recording instructions in
acquire_data/say_numbers_prompt.py and then run
split_and_label_numbers.py to make your files.
Files are named in the following format: {digitLabel}{speakerName}
{index}.wav Example: 7_jackson_32.wav
Now it contains 3,000 recordings (50 of each digit per speaker) from 6
speaks in English Prounciations.
metadata.py contains meta-data regarding the speakers gender and
accents.