Artificial Analysis has released version 2.0 of its AA-WER speech-to-text benchmark, which measures the accuracy of speech recognition models. In the overall ranking, ElevenLabs’ Scribe v2 takes first place with a word error rate of just 2.3%.
Second and third place go to Google’s Gemini 3 Pro at 2.9% and Mistral’s Voxtral Small at 3.0%, respectively. Other strong performers include Google Gemini 3 Flash at 3.1% and ElevenLabs Scribe v1 at 3.2%. In the middle of the pack are models such as OpenAI’s GPT-4o Transcribe at 4.0% and Whisper Large v3 at 4.2%. Toward the lower end of the ranking are Alibaba’s Qwen3 ASR Flash at 5.9%, Amazon Nova 2 Omni at 6.0%, and Rev AI at 6.1%.
ElevenLabs Scribe v2 leads the overall AA-WER v2.0 benchmark ranking with the lowest word error rate, followed by Google Gemini 3 Pro and Mistral Voxtral Small. | Image: Artificial Analysis
In a separate benchmark focused specifically on speech directed at voice assistants, the overall picture remains largely the same. Scribe v2 again leads with a word error rate of 1.6%, followed closely by Gemini 3 Pro at 1.7%. AssemblyAI’s Universal-3 Pro ranks third with 2.3%.
In the AA-AgentTalk test for speech on voice assistants, Scribe v2 from ElevenLabs and Gemini 3 Pro from Google also dominate with the lowest error rates. | Image: Artificial Analysis
Is an AI industry analyst covering major AI platforms, enterprise adoption, and strategic moves by Big Tech companies. His work focuses on how AI systems are deployed at scale and how they reshape products, markets, and user behavior.