Xiaomi's MiMo-V2.5-Pro Builds a Compiler in Under 5 Hours — and Rivals Claude Opus 4.6 on Coding Benchmarks

Details: By Chris Borden; Category: Models; 1 w; 03 May 2026; 68

With MiMo-V2.5-Pro, Xiaomi has released an AI model that — according to internal tests — writes a complete compiler in under five hours and rivals Anthropic's Claude Opus 4.6 on coding benchmarks, while consuming significantly fewer tokens than its Western competitors.

MiMo-V2.5-Pro is a Mixture-of-Experts model, meaning only a portion of the network is activated for each request rather than the entire model. In total, the system contains 1.02 trillion parameters, with 42 billion active per query. According to the MiMo team, the new version is primarily aimed at long-horizon tasks spanning hours of work and thousands of tool calls.

The context window pushes the boundaries of what is currently possible: the main version processes up to one million tokens at once, while the base variant — without post-training — is capped at 256,000 tokens.

Architecture diagram of MiMo-V2.5. On the left, the audio path with Audio Tokenizer, Local Transformer, and Audio Projector; in the middle, the visual path with MiMo ViT and Visual Projector; on the right, the text path. All three inputs lead to the MiMo Hybrid-SWA backbone, from which the LM Head and MTP Block branch off at the top. Audio, image, and text are each converted into a format that the language model understands using their own encoders. Three translators, then, all feeding into the same backbone. | Image: Xiaomi — Audio, image, and text are each converted into a format that the language model understands using their own encoders. Three translators, then, all feeding into the same backbone. | Image: Xiaomi

A Compiler Built in an Afternoon

Xiaomi illustrates the leap forward over its predecessor with three concrete examples. In the first, the team tasked the model with building a complete compiler project used in coursework at Peking University — an assignment that typically takes computer science students several weeks to complete.

MiMo-V2.5-Pro delivered the project in 4.3 hours through 672 tool calls, achieving a perfect score of 233 out of 233 on the hidden test suite. What Xiaomi highlights is not the score itself, but the approach: the model first constructed the entire pipeline as a scaffold, then worked through each stage layer by layer. Even the very first compile run passed 137 of 233 tests. A later refactoring phase introduced a regression that the model independently diagnosed and resolved.

MiMo-V2.5-Pro processed the compiler in four phases over 4.3 hours, increasing test coverage from 59 percent on the first compile to 100 percent. | Image: Xiaomi

In a second demonstration, MiMo-V2.5-Pro built a desktop video editor of roughly 8,000 lines of code from just a few instructions — working autonomously for 11.5 hours and making around 1,870 tool calls in the process.

A third demonstration connected the model via Claude Code to a circuit simulator and had it design a voltage regulator. Within one hour, the result met all six technical specifications simultaneously — four of which exceeded the model's initial design by roughly an order of magnitude.

Fewer Tokens for Comparable Results

Xiaomi positions MiMo-V2.5-Pro primarily on its performance-to-token-cost ratio. On the in-house agent benchmark ClawEval, the model achieves a 64% success rate using around 70,000 tokens per task run — 40 to 60% fewer than Claude Opus 4.6, Gemini 3.1 Pro, and GPT-5.4 require for comparable results.

On coding benchmarks, the model scores 78.9 on SWE-bench Verified, 57.2 on SWE-Bench Pro, and 68.4 on Terminal-Bench 2.0, with 73.7 on the proprietary MiMo Coding Bench. This puts it within striking distance of Claude Opus 4.6 (77.1), while Gemini 3.1 Pro falls back to 67.8. On general agent tasks, MiMo-V2.5-Pro earns 1,581 Elo points on GDPVal-AA and 72.9 on τ³-bench.

On SWE-Bench Pro, MiMo-V2.5-Pro and Claude Opus 4.6 are almost neck and neck, while on Terminal-Bench 2.0 the Xiaomi model pulls ahead.

Progress is most striking on long-context tasks. On OpenAI's GraphWalks benchmark — which requires navigating complex node graphs — the predecessor MiMo-V2-Pro collapsed to zero at one million tokens. MiMo-V2.5-Pro scores 0.37 for breadth-first search and 0.62 for parent-node queries under the same conditions.

The model inherits its technical foundation from MiMo-V2-Flash. A hybrid local-global attention mechanism cuts memory usage on long texts by nearly seven times, while a parallel token prediction mechanism triples output speed. Pre-training ran on 27 trillion tokens, with the context window subsequently extended in stages to one million tokens.

Post-training follows a teacher-student approach: multiple specialist models are first trained separately on domains such as mathematics, safety, and tool use. A single student model then learns from its own solution attempts under the guidance of all specialists, absorbing their combined capabilities.

Three More Models in Tow

Alongside the Pro model, Xiaomi is releasing three additional systems. MiMo-V2.5 is a smaller variant with 310 billion parameters, 15 billion of which are active per query. It natively processes text, images, video, and audio, supports up to one million tokens of context, and was trained on roughly 48 trillion tokens. On the Video-MME benchmark, it scores 87.7 — on par with Gemini 3 Pro, according to Xiaomi. Like the Pro model, it is available as open weights on Hugging Face.

The picture is different for speech synthesis. MiMo-V2.5-TTS is a family of three models: one offering preset voices, one generating new voices from text descriptions, and one cloning voices from short audio clips. Users can control prosody directly in the text using inline tags such as [crying] or [whispers]. These models are not openly available; they can only be accessed via API through Xiaomi's own platform, currently free during a limited period.

The three TTS variants run exclusively via the API and Xiaomi's own MiMo Studio, currently free of charge. Open weights are not available. | Image: Xiaomi

The speech recognition model MiMo-V2.5-ASR, by contrast, is openly available. It operates bilingually in Chinese and English, handles Chinese dialects including Wu, Cantonese, and Hokkien, and can manage mid-sentence language switches as well as song lyrics. On the Open ASR Leaderboard, it achieves an average word error rate of 5.73%.

The gap to Gemini 3.1 Pro is greatest with dialects and Chinese lyrics, at over 16 percentage points. Lower scores are better. | Image: Xiaomi

China's Open-Weight Providers Double Down

With this release, Xiaomi's MiMo team continues the trajectory it set at the end of 2025: many models released simultaneously, largely open, and all designed for autonomously operating AI agents. Next steps outlined by the team include further scaling of training and improved understanding of long-range dependencies beyond individual sentences.

Shortly before this release, Xiaomi had already introduced its first complete three-model package with MiMo-V2-Pro, MiMo-V2-Omni, and MiMo-V2-TTS. The Pro model at the time had previously topped OpenRouter's usage rankings for several days under the anonymous codename "Hunter Alpha," with many users initially suspecting it was a new Deepseek release.

About The Hosts

Chris Borden

AI Analyst & Technology Researcher

AI researcher and industry analyst covering decentralized infrastructure, AI systems, and emerging technology markets. Focused on data-driven analysis, long-term trends, and real-world adoption of artificial intelligence.

Categories

AI News

Categories

AI & Society

Categories

AI Insights

Xiaomi's MiMo-V2.5-Pro Builds a Compiler in Under 5 Hours — and Rivals Claude Opus 4.6 on Coding Benchmarks

A Compiler Built in an Afternoon

Fewer Tokens for Comparable Results

Three More Models in Tow

China's Open-Weight Providers Double Down

About The Hosts

More From Chris Borden