Integrating into Africa’s AI Landscape
The WAXAL project underscores our dedication to actively engaging with and contributing to Africa’s burgeoning AI ecosystem. Our data collection efforts, guided by Google’s experts in premier data collection methodologies, were entirely executed by African academic and community organizations. This collaborative model ensured that the dataset was created by and for the communities it aims to serve. Each partner focused on different aspects of the language, emphasizing localized expertise to drive greater impacts.
Notable collaborators included Makerere University, which gathered Automatic Speech Recognition (ASR) and Text-to-Speech (TTS) data across nine languages. Meanwhile, the University of Ghana concentrated on eight languages utilizing the ASR image data collection approach. Another essential partner, Digital Umuganda, in collaboration with Addis Ababa University, played a pivotal role in leading ASR collection initiatives for several regional languages. Furthermore, to ensure high-quality audio recordings, teams from Media Trust, Loud n Clear, and the Senegalese Institute of African Mathematical Sciences spearheaded TTS recordings across a variety of regional dialects.
Our collaborative framework is grounded in the principle that partners retain ownership of the data collected. There is a shared commitment to making all datasets available to the broader community, fostering open access. This approach has already led to notable derivative research and publications within the field.
Through this framework, our collaborators have kickstarted innovative research, including the creation of a community-driven language disorder collection cookbook. This pioneering project has produced the first open-source dataset tailored for Akan speakers with conditions such as cerebral palsy and stuttering, demonstrating the effectiveness of in-person image prompts over text-based alternatives. This research serves as a crucial guide in developing comprehensive voice technologies in low-resource settings.
Additionally, the initiative supported a significant study that introduced a 5,000-hour audio corpus encompassing five Ghanaian languages: Akan, Ewe, Dagbani, Dagare, and Ikposo. This effort established a robust infrastructure for building ASR and TTS systems specifically designed to accommodate the linguistic diversity of West Africa, capturing natural, spontaneous intonations through a controlled crowdsourcing methodology.
Other critical research endeavors have included benchmarking four advanced models—Whisper, XLS-R, MMS, and W2v-BERT—across 13 African languages. This study meticulously analyzed how performance varies with the increase in training data, shedding light on data efficiency and revealing that the advantages of scaling are closely linked to the complexity of the language and its alignment with specific domains.
Finally, a thorough literature review cataloged 74 datasets spanning 111 African languages, mapping the current landscape in voice technology. This review underscored an urgent need for multidomain conversational corpora and advocated for the adoption of language-specific metrics, such as character error rate (CER), to enhance performance assessment in morphologically rich and tonal language contexts.
