CDLI | Glossary of Terms

Accessibility

The design of technologies, services, and environments so they can be used by people with diverse abilities, including people with disabilities.

Assistive Technology

Tools or systems that help people with disabilities perform tasks more easily. Inclusive speech recognition can function as assistive technology for communication.

Automatic Speech Recognition (ASR)

Technology that converts spoken language into written text. ASR enables devices and software to "understand" human speech and is used in tools such as voice assistants, transcription services, and accessibility technologies.

Community-Centred Design

An approach that actively involves people who will use the technology in its design, development, and testing, ensuring solutions reflect real needs and lived experiences.

Data Annotation

The process of labelling speech recordings with accurate text or metadata. Annotation helps machines learn how spoken words relate to written language.

Digital Language Inclusion

The practice of ensuring that digital technologies recognise, understand, and support all ways people speak, including diverse languages, accents, and speech patterns.

Ethical Data Collection

The practice of collecting data with informed consent, transparency, privacy protection, and respect for participants' rights and dignity.

Inclusive AI

Artificial Intelligence systems designed to work fairly and effectively for people of different abilities, languages, and backgrounds, without excluding or disadvantaging certain groups.

Machine Learning

A branch of AI that allows computer systems to learn patterns from data and improve their performance over time without being explicitly programmed for every task.

Non-Standard Speech

Speech that differs from what most speech recognition systems are trained on. This may include speech affected by disability, neurological conditions, injury, accent variation, or local language patterns.

Open-Source

Software, data, or models that are freely available for anyone to use, study, modify, and share. CDLI promotes open-source approaches to accelerate inclusive innovation.

Speech Dataset

A structured collection of recorded speech samples and corresponding text. These datasets are used to train and improve speech recognition models.

Speech Impairment

A condition that affects a person's ability to produce clear or typical speech. This can be temporary or lifelong and may result from disability, illness, injury, or neurological differences.

Speech Model

A trained AI system that processes and recognises spoken language. Speech models improve as they are exposed to more diverse and representative speech data.

Under-Resourced Languages

Languages that lack sufficient digital data, tools, or technological support. Many African languages fall into this category, making them poorly supported by mainstream speech technologies.