Ryan Coogler Named Creative of the Year as Creative Collective NYC Celebrates 83 Black Innovators
February 3, 2026
The Real Reason Your Google Ads Aren’t Converting (And How a PPC Audit Fixes It Fast)
February 4, 2026After three years of development, Google, in partnership with African research institutes, has finally announced WAXAL.
WAXAL, derived from the word “speak” in Wolof, is an open-source language data set containing 21 African languages, including Swahili, Yoruba, and Shona.
The development of virtual assistants like Apple’s Siri, Google Assistant, and Amazon’s Alexa is a notable milestone in natural language processing. Yet AI developers have neglected over 100 million people in Sub-Saharan Africa, leaving them unable to access the technology in their native tongue. Google aims to reduce this inequality with WAXAL.
Related Post: Google Pixel 7 Pro Review
A Large-Scale African Language Resource
By collecting over 11,000 hours of voice data and nearly 2 million recordings, Google has produced one of the largest open-source datasets focused solely on African Languages.
This project marks a major advancement in inclusion and linguistic representation in voice-enabled AI. Developers can use this dataset to build Automated Speech Recognition (ASR) and Text-to-Speech (TTS) systems. These are useful for voice assistants, automated call centers, and TTS tools.
For African startups, WAXAL will lower the cost of building local-language AI products. WAXAL also reduces dependence on foreign datasets that often fail to capture regional dialects.
Related Post: Google Pixel Watch 3 Review: The Biggest Google Watch Yet!
For Africans By Africans
Google worked with local African institutions, including Makerere University in Uganda, Digital Umuganda in Rwanda, the University of Ghana, and the African Institute for Mathematical Sciences (AIMS), which led the data collection.
Participants recorded speech in their real accents and speaking styles. At the University of Ghana, over 7,000 volunteers contributed to the project by having their voices recorded, making it truly collaborative.
This local approach improves data quality and cultural accuracy, ensuring that African languages are represented authentically.
Data Ownership and Ethical Collaboration.
Instead of extracting the data, Google built mutually beneficial partnerships. Each research institution retains full ownership of the data they collected. As equal collaborators, the organisations can reuse the data for research and education.
This model supports long-term innovation in Africa’s AI ecosystem and encourages ethical, transparent data practices.
Related Post: Unconventional AI Just Raised a $475M Seed Round
Open source Format
The full WAXAL data set is publicly available under an open license on Hugging Face. This allows anyone to access the data set free of charge, creating an equitable playing field. Open access is most important to students and startups that may lack the resources to afford licensing subscriptions.
“This dataset provides the critical foundation for students, researchers, and entrepreneurs to build technology on their own terms, in their own languages,” Aisha Walcott-Bryantt, Head of Google Research Africa, says.
A Growing Movement for African Language Inclusion.
This initiative joins other projects, such as Lelapa AI and N-ATLAS, in pioneering the inclusion of African languages in the development of voice-automated technology. Together, these initiatives signal the importance of including underrepresented languages in the AI economy.
Main Image: Courtesy of Google

