This week, the Centre for Digital Language Inclusion (CDLI) led by UCL’s Global Disability Innovation Hub (GDI Hub) unveiled its first comprehensive collection of impaired  speech datasets on data platform HuggingFace. The launch  marks a pivotal moment in addressing the longstanding underrepresentation of African languages and dialects in artificial intelligence and natural language processing technologies.

Africa Tech Summit Nairobi provided the ideal backdrop for this announcement. As one of the continent's most influential technology conferences, the summit convenes founders, funders, policymakers, and global tech platforms actively shaping how technology is built, funded, and deployed across Africa.

The language datasets include Kenyan English, Swahili, Ugandan English, and Luganda representing the first African dataset collection specifically focused on impaired speech variations. This distinction is crucial, as it captures the natural variations, code-switching, and dialectal features that characterise everyday communication across East Africa.

"This is a huge milestone for our team and for African language technology," said Dr. Katrin Tomanek, CDLI's AI Tech Lead. "By publishing these datasets, we're addressing a critical gap in AI development and ensuring that the rich linguistic diversity of East Africa including those that have speech differences are recognised and accessible to researchers and developers worldwide."

The datasets are now publicly available open-source on Hugging Face, with an access protocol designed to ensure responsible use while maintaining openness for legitimate research and development purposes.

The release will enable researchers, developers, and organisations to build more inclusive and accurate language technologies that better serve African communities. By capturing authentic language use including the complex linguistic realities of multilingual societies these datasets represent a significant step toward AI systems that truly understand how Africans communicate including those who have impaired speech.

Beyond the dataset launch, CDLI's presence at Africa Tech Summit serves a broader strategic purpose: to showcase tangible work already happening on the ground and to build partnerships that can translate into datasets, programmes, pilots, and long-term collaborations across the continent.

This initiative is led by UCL’s Global Disability Innovation Hub supported by google.org and UK International Development funded AT2030 Programme. The Centre for Digital Language Inclusion works in collaboration with local and international partners. Technical support has been provided by Modal, whose GPU sponsorship is powering the development of the ASR models. Other collaborators include The Research Center Trustworthy Data Science and Security, Talking Tipps Africa Foundation, Senses Hub, University of Ghana, Strathmore University, Hogan Lovells.
For more information visit CDLI website: www.cdl-inclusion.com

####

ENDS

Kenya
13 Feb 26