Data

All of CDLI's speech datasets are freely available as open-source resources for researchers, developers, and innovators worldwide. We've collected recordings of non-standard speech across multiple  languages, from individuals with conditions like cerebral palsy, stroke, and Parkinson's disease. These datasets, complete with transcriptions and metadata, enable anyone to build and train ASR models that work for voices historically excluded from speech technology. By making this data open and accessible, we're empowering local communities to create their own solutions rather than waiting for global tech companies to serve their needs.

Non-standard speech datasets

This is a collection of our non- standard speech datasets. They are available for download from CDLI's Hugging Face repository

ASR models for non-standard speech

This is a collection of our CDLI non-standard speech ASR models. They are available for download from CDLI's Hugging Face repository.