VoxCeleb1 contains over 100,000 utterances for 1,251 celebrities, extracted from videos
uploaded to YouTube.
VoxCeleb is an audio-visual dataset consisting of short clips of human speech, extracted from
interview videos uploaded to YouTube.
If you would like to unzip the audio dataset, please fill this form to request a password.
@InProceedings{Nagrani17,
author = "Nagrani, A. and Chung, J.~S. and Zisserman, A.",
title = "VoxCeleb: a large-scale speaker identification dataset",
booktitle = "INTERSPEECH",
year = "2017",
}