CDLI Research Outputs

A Cookbook for Community-driven Data Collection of Impaired Speech in Low-Resource Languages

Sumaya Ahmed Salihs, Isaac Wiafe, Jamal Abdulai, Gifty Ayoka, Richard Cave, Akon Ekpezu, Catherine Holloway, Katrin Tomanek

This study presents CDLI’s community-driven methodology for the creation of an impaired speech corpus in a low-resource language (LRL), specifically Akan, spoken by around 22 million people, or 80% of the population in Ghana. The project adapted an open-source data collection app, incorporating both image and text prompts appropriate for people living with impaired speech. Data collection involved in-person and virtual methods, with speech and language therapist screening of potential participants based on speech severity and cognitive skills. Thirty hours of audio data were collected from people living with cerebral palsy, stammering, and cleft palate. The paper discusses the challenges encountered in data collection and transcription of Akan – with its still evolving writing system. The paper explores adaptation of the open-source Whisper model by fine-tuning a base Akan model (trained on approximately 100 hours of unimpaired speech in Akan and using the collected impaired speech data. Initial results demonstrate a median relative WER reduction of 21.7% on the impaired speech test set, highlighting the significant performance gap of standard ASR on disordered speech (baseline median WER of 84.6%). The study identifies data quality and transcription inconsistencies as key areas for future improvement. The resulting dataset, cookbook, and open-source tools will be publicly available.

In review

How people living with amyotrophic lateral sclerosis use personalized automatic speech recognition technology to support communication

Richard Cave

Amyotrophic lateral sclerosis (ALS) - also known as Motor Neurone Disease (MND) - is a progressive, ultimately fatal disease causing progressive muscular weakness. Most people living with ALS (plwALS) experience speech change (often referred to as dysarthria), eventually becoming unable to communicate using natural speech. Many wish to use speech for as long as possible. Personalized automated speech recognition (ASR) model technology, such as Google's Project Relate, is argued to better recognize speech with dysarthria, supporting maintenance of understanding through real-time captioning. The objectives of this study are how plwALS and communication partners use Relate in everyday conversation over a period of up to 12 months and how it may change with any decline in speech over time.The study videoed interactions between three plwALS and communication partners. We assessed ASR caption accuracy and how well they preserved meaning. Conversation analysis was used to identify participants' own organizational practices in the accomplishment of interaction. Thematic analysis was used to understand better the participants' experiences of using ASR captions.All plwALS reported lower-than-expected ASR accuracy when used in conversation and felt ASR captioning was useful in certain contexts. All participants liked the concept of live captioning and were hopeful that future improvements to ASR accuracy may support their communication in everyday life. Training is needed on best practices for customization and practical use of ASR technology and for the limitations of ASR in conversational settings. Support is needed for those less confident with technology and to reduce misplaced allocation of ownership of captioning errors, risking negative effects on psychological well-being.

Journal of Speech, Language, and Hearing Research, Volume 67, Number 11 Pages 4186-4202

Link to Full Content

Developing African Language Models for Atypical Speech

Richard Cave, Catherine Holloway, Gifty Ayoka, Katrin Tomanek, Giulia Barbareschi, Victoria Austin

This paper outlines the need for and ongoing development of automated speech recognition (ASR) models for people living with impaired speech in African languages and support innovation of apps and tools for functional use in everyday conversation. While English language ASR models exist for interpreting impaired speech, no known work has addressed language models for African languages. The Centre for Digital Language Inclusion (CDLI) was established to address this gap by creating technologies that support individuals with atypical speech in local languages and cultures, starting with ten African languages.The development of ASR for impaired speech in Low Resource Languages (LRLs) faces significant barriers, primarily due to the lack of recorded speech samples. Existing datasets are almost exclusively in American English, with very limited representation of other languages, and even fewer LRLs. English-focused models often exhibit poor accuracy with how English is spoken in Africa. To overcome these challenges, CDLI adopts a community-led, user-centric research practice, involving partnering with local institutions to collect recordings of impaired speech, developing open-source tools for data collection and ASR model building, and providing technical training. CDLIs key principle is to democratise speech recognition technology by empowering local communities to create their own datasets and AI models. CDLI's work in Ghana with the Akan language serves as a pilot study towards this goal. The longer-term goal is to foster local, autonomous, and sustainable skills for creating inclusive ASR technologies that meet the specific needs of atypical speakers across Africa, and beyond.

CHI 2025

Technology is not enough: Exploring the Infrastructure needed for Gaze-based Mobile Communication Technology Adoption

Gifty Ayoka, Giulia Barbareschi, Richard Cave, Catherine Holloway

This study investigates the barriers to the effective adoption of gaze-based Augmentative and Alternative Communication (AAC) technology, specifically Google’s free ‘Look to Speak’ app in Ghana. While such technology holds promise for individuals with complex communication difficulties, especially in low resource settings, the research highlights that technology alone is insufficient for successful implementation. The study identifies significant deficits in social, technical, and service infrastructure that impede adoption, potentially outweighing any functional benefits of the technology itself. The research identified critical challenges in the form of a lack of readily available technical support for device maintenance and software updates, coupled with limited access to trained speech and language therapists (SLTs) and caregivers capable of supporting users. The paper emphasises the importance of ‘human infrastructure’, including the availability of informed caregivers and community support, and ‘service infrastructure’, encompassing provision, policy, and personnel within assistive technology services. The research proposes a "technology deficit model" where inadequacies in human and service infrastructure significantly diminish the potential impact of technological innovations.

Submitted for Publication

Enhancing Communication Equity: evaluation of an automated speech recognition application in Ghana

Gifty Ayoka, Giulia Barbareschi, Richard Cave, Catherine Holloway

This study investigates the practical feasibility and equity of using Google Project Relate in Ghana. Relate is an automatic speech recognition (ASR) application for captioning impaired speech in English. The research addresses the communication barriers faced by individuals with speech difficulties in a context with limited Speech and Language Therapy (SLT) services and assistive technologies. Employing the Technology Amplification Theory, a 6-week user study was conducted with 10 SLTs and 20 adults with communication difficulties to examine differential access, capacity, and motivation in using the application.The results identified that differential access is influenced by smartphone ownership, internet connectivity, and language compatibility, particularly Relate is only available in English. Differential capacity depends on literacy, ability to create custom speech samples, and social stigma affecting interaction. Differential motivation is determined by individual circumstances and the acceptance of technology-mediated communication by conversation partners. The study highlights the contextual nature of language, the importance of stakeholder engagement beyond users, and the need to acknowledge both the strengths and limitations of ASR technology in functional use.The research provides novel insights into the challenges and opportunities of deploying ASR technology in the Global South, offering recommendations for HCI researchers and developers to enhance communication equity for individuals with non-standard speech. Recommendations included the necessity for localisation of language models, consideration of the broader social context, and comprehensive user support and training. While ASR tools like Project Relate hold promise, their equitable and effective implementation requires addressing systemic barriers and adapting to local contexts.

Proceedings of the 2024 CHI Conference on Human Factors in Computing Systems (CHI '24). Association for Computing Machinery, New York, NY, USA, Article 394, 1–16.

Link to Full Content