The field of artificial intelligence has seen remarkable advancements in recent years, but a significant challenge has persisted – most AI language models have been heavily focused on English, struggling with low-resource and non-dominant languages. This has limited the potential of AI to serve a truly global audience. However, a groundbreaking development from Cohere for AI (C4AI), a non-profit AI research organization, promises to change that. Cohere for AI (C4AI) has announced the open release of Aya 23, a new family of state-of-the-art multilingual language models that aim to bridge the language gap and make AI more inclusive. Available in 8 billion (8B) and 35 billion (35B) parameter versions, these models have been specifically designed to deliver robust performance across 23 languages, covering nearly half of the world’s population.
The Journey to Aya 23: Overcoming Key Challenges
The path to creating effective multilingual AI models has been riddled with hurdles. C4AI researchers identified two major roadblocks:
- Lack of Robust Multilingual Pre-trained Models: Most existing language models were primarily trained on English data, resulting in poor performance when dealing with other languages, especially low-resource ones.
- Scarcity of Multilingual Training Data: There was a dearth of high-quality, instruction-style training data that covered a diverse range of languages, limiting the models’ ability to understand and respond to prompts in multiple languages.
To tackle these challenges, C4AI launched the Aya initiative, a groundbreaking collaboration that brought together over 3,000 independent researchers from 119 countries. This global effort resulted in the creation of the Aya Collection, a massive dataset consisting of 513 million instances of prompts and completions spanning multiple languages.
Today, we launch Aya 23, a state-of-art multilingual 8B and 35B open weights release.
Aya 23 pairs a highly performant pre-trained model with the recent Aya dataset, making multilingual generative AI breakthroughs accessible to the research community. 🌍https://t.co/9HsmypAbBb pic.twitter.com/TqHlNfh6zf
— Cohere For AI (@CohereForAI) May 23, 2024
The Aya 101 Breakthrough and Its Limitations
Using the Aya Collection, C4AI developed Aya 101, a pioneering open-source language model that supported an impressive 101 languages. Released in February 2024, Aya 101 marked a significant step forward in massively multilingual language modeling.
However, Aya 101 had its limitations. It was built upon outdated technology and focused primarily on breadth, sacrificing performance on specific languages by spreading its capacity too thinly across 101 languages.
Aya 23: Balancing Breadth and Depth
With Aya 23, C4AI has addressed the shortcomings of Aya 101, striking a balance between breadth and depth. Instead of trying to cover an excessive number of languages, Aya 23 focuses on delivering superior performance across 23 carefully selected languages:
- Arabic
- Chinese (simplified & traditional)
- Czech
- Dutch
- English
- French
- German
- Greek
- Hebrew
- Hindi
- Indonesian
- Italian
- Japanese
- Korean
- Persian
- Polish
- Portuguese
- Romanian
- Russian
- Spanish
- Turkish
- Ukrainian
- Vietnamese
By allocating more capacity to fewer languages, Aya 23 can generate higher-quality responses across these diverse tongues.
Impressive Performance and Benchmarking
The results of Aya 23 have been nothing short of impressive. In benchmarking, the 35B parameter version outperformed other widely used open-source models, achieving top results across all covered languages.
Compared to its predecessor, Aya 101, Aya 23 demonstrated significant improvements:
- Up to 14% better performance on discriminative tasks (e.g., language understanding)
- Up to 20% better performance on generative tasks (e.g., text generation)
- Up to 41.6% improvement on multilingual MMLU (Mean Log-Likelihood Under-approximation)
- 6.6 times better at multilingual mathematical reasoning
Not only that, but in head-to-head comparisons with human annotators and other language models acting as judges, the Aya 23 models were consistently preferred.
Democratizing Access: Open Weights and Free Trials
In line with C4AI’s commitment to advancing inclusive AI, the organization has released the open weights for both the 8B and 35B versions of Aya 23 on Hugging Face, a popular platform for machine learning models. This release is licensed under the Creative Commons attribution-noncommercial 4.0 international public license, allowing researchers and developers to experiment with and build upon these models for non-commercial purposes.
To further democratize access, users can try out the new Aya 23 models for free on the Cohere Playground, a user-friendly interface for interacting with and testing the models’ capabilities.
Empowering Researchers and Practitioners
By releasing the open weights of Aya 23, C4AI aims to empower researchers and practitioners worldwide to advance multilingual models and applications. This open access approach fosters collaboration, innovation, and the development of more inclusive AI solutions that can benefit people across diverse linguistic backgrounds.
The Future of Multilingual AI
The release of Aya 23 represents a major milestone in the journey towards truly inclusive AI. With its impressive performance and commitment to open access, this development has the potential to accelerate research and development in the field of multilingual language modeling.
As AI continues to permeate various aspects of our lives, from virtual assistants to content generation and beyond, the ability to communicate effectively across languages becomes increasingly crucial. Aya 23 paves the way for a future where language barriers are no longer an obstacle, enabling AI to serve and empower people regardless of their native tongue.
While there is still work to be done, the collaborative efforts of organizations like Cohere for AI and the global research community are bringing us closer to a world where the benefits of AI are accessible to all, transcending linguistic boundaries.