Ryan coogler
Ryan Coogler Named Creative of the Year as Creative Collective NYC Celebrates 83 Black Innovators
February 3, 2026
Google Ads
The Real Reason Your Google Ads Aren’t Converting (And How a PPC Audit Fixes It Fast)
February 4, 2026
Ryan coogler
Ryan Coogler Named Creative of the Year as Creative Collective NYC Celebrates 83 Black Innovators
February 3, 2026
Google Ads
The Real Reason Your Google Ads Aren’t Converting (And How a PPC Audit Fixes It Fast)
February 4, 2026

Google Launches WAXAL, an Open Source Dataset for African Languages

NewsWorld
Google Introduces Open Source African Language Data Set: WAXAL

Google Introduces Open Source African Language Data Set: WAXAL

After three years of development, Google, in partnership with African research institutes, has finally announced WAXAL. 

WAXAL, derived from the word “speak” in Wolof, is an open-source language data set containing 21 African languages, including Swahili, Yoruba, and Shona. 

The development of virtual assistants like Apple’s Siri, Google Assistant, and Amazon’s Alexa is a notable milestone in natural language processing. Yet AI developers have neglected over 100 million people in Sub-Saharan Africa, leaving them unable to access the technology in their native tongue. Google aims to reduce this inequality with WAXAL. 

Related Post: Google Pixel 7 Pro Review

A Large-Scale African Language Resource

By collecting over 11,000 hours of voice data and nearly 2 million recordings, Google has produced one of the largest open-source datasets focused solely on African Languages. 

This project marks a major advancement in inclusion and linguistic representation in voice-enabled AI. Developers can use this dataset to build Automated Speech Recognition (ASR) and Text-to-Speech (TTS) systems. These are useful for voice assistants, automated call centers, and TTS tools. 

For African startups, WAXAL will lower the cost of building local-language AI products. WAXAL also reduces dependence on foreign datasets that often fail to capture regional dialects. 

Advertisement

Related Post: Google Pixel Watch 3 Review: The Biggest Google Watch Yet!

For Africans By Africans 

Google worked with local African institutions, including Makerere University in Uganda, Digital Umuganda in Rwanda, the University of Ghana, and the African Institute for Mathematical Sciences (AIMS), which led the data collection. 

Participants recorded speech in their real accents and speaking styles. At the University of Ghana, over 7,000 volunteers contributed to the project by having their voices recorded, making it truly collaborative.

Advertisement

This local approach improves data quality and cultural accuracy, ensuring that African languages are represented authentically. 

Data Ownership and Ethical Collaboration. 

Instead of extracting the data, Google built mutually beneficial partnerships. Each research institution retains full ownership of the data they collected. As equal collaborators, the organisations can reuse the data for research and education. 

This model supports long-term innovation in Africa’s AI ecosystem and encourages ethical, transparent data practices. 

Advertisement

Related Post: Unconventional AI Just Raised a $475M Seed Round

Open source Format

The full WAXAL data set is publicly available under an open license on Hugging Face. This allows anyone to access the data set free of charge, creating an equitable playing field. Open access is most important to students and startups that may lack the resources to afford licensing subscriptions. 

“This dataset provides the critical foundation for students, researchers, and entrepreneurs to build technology on their own terms, in their own languages,” Aisha Walcott-Bryantt, Head of Google Research Africa, says.

Advertisement

A Growing Movement for African Language Inclusion. 

This initiative joins other projects, such as Lelapa AI and N-ATLAS, in pioneering the inclusion of African languages in the development of voice-automated technology. Together, these initiatives signal the importance of including underrepresented languages in the AI economy. 

Main Image: Courtesy of Google

Toggle Dark Mode
Share
Share
Tweet
Reddit
Email