Data
All of CDLI's speech datasets are freely available as open-source resources for researchers, developers, and innovators worldwide. We've collected recordings of non-standard speech across multiple languages, from individuals with conditions like cerebral palsy, stroke, and Parkinson's disease. These datasets, complete with transcriptions and metadata, enable anyone to build and train ASR models that work for voices historically excluded from speech technology. By making this data open and accessible, we're empowering local communities to create their own solutions rather than waiting for global tech companies to serve their needs.
Non-standard speech datasets
This is a collection of our non- standard speech datasets. They are available for download from CDLI's Hugging Face repository
ASR models for non-standard speech
This is a collection of our CDLI non-standard speech ASR models. They are available for download from CDLI's Hugging Face repository.