February 12, 2026

MiniMax-M2.5: Built for Real-World Productivity

The frontier coding and agentic model trained across 200,000+ real-world environments. SOTA on SWE-Bench Verified, BrowseComp, and Terminal-Bench with a 200K-token context window.

Run MiniMax-M2.5 continuously for one hour at 100 tokens/s for under $1.

Try Chat Access API

Benchmarks

Industry-leading performance across coding, search, and agentic tasks

80.2%

SWE-Bench Verified

State-of-the-art on the industry-standard code repair benchmark, averaged over 4 runs

51.3%

Multi-SWE-Bench

Leading performance on multi-repository, cross-project software engineering tasks

76.3%

BrowseComp

Best-in-class web research and information retrieval with context management

200K

Context Window

Process extensive codebases, long documents, and multi-turn agent sessions in a single context

Coding

Architect-level planning meets SOTA execution

MiniMax-M2.5 approaches complex projects the way a senior software architect would: decomposing requirements, planning structure, and designing interfaces before writing a single line of code. This spec-writing behavior emerged naturally during reinforcement learning across 200,000+ real-world environments.

Trained on 13+ languages including Go, C, C++, TypeScript, Rust, Kotlin, Python, Java, JavaScript, PHP, Lua, Dart, and Ruby
Covers the full development lifecycle: system design, environment setup, feature development, iteration, code review, and testing
Full-stack across Web, Android, iOS, and Windows — server APIs, business logic, databases, not just frontend demos
On Droid: 79.7 (M2.5) vs 78.9 (Opus 4.6). On OpenCode: 76.1 (M2.5) vs 75.9 (Opus 4.6)
Upgraded VIBE benchmark to the more complex Pro version — M2.5 performs on par with Opus 4.5

MiniMax-M2.5 SWE-Bench results across different agent harnesses

Search & Tool Calling

Smarter decisions, fewer rounds, better results

MiniMax-M2.5 achieves industry-leading performance on BrowseComp and Wide Search while using approximately 20% fewer reasoning rounds than M2.1. The model has learned to solve problems through more precise search strategies and better token efficiency — not just getting the right answer, but finding it through more efficient paths.

76.3% on BrowseComp with context management — best-in-class web research capability
Built and evaluated on RISE (Realistic Interactive Search Evaluation) for expert-level search tasks in real-world professional settings
Stronger generalization across unfamiliar scaffolding environments compared to previous generations
~20% fewer rounds across BrowseComp, Wide Search, and RISE compared to M2.1

MiniMax-M2.5 search and tool calling benchmark results

Office & Finance

Enterprise-grade document intelligence and financial modeling

MiniMax-M2.5 brings frontier-model capabilities to real enterprise workflows. From Excel competitions to financial modeling, the model handles composite instruction constraints and multi-step business processes that demand both precision and domain expertise.

Evaluated on MEWC (Microsoft Excel World Championship) — 179 problems from 2021–2026 competition divisions
Financial modeling benchmark with end-to-end research and analysis tasks scored by expert-designed rubrics
Enhanced handling of composite instruction constraints for complex multi-step office scenarios
GDPval-MM evaluation shows strong performance with lower average token cost per task

MiniMax-M2.5 office and finance benchmark results

Reasoning & General Intelligence

Efficient reasoning that translates to real-world performance

Trained with reinforcement learning to reason efficiently and decompose tasks optimally, MiniMax-M2.5 delivers strong performance across mathematics, science, and general knowledge benchmarks while maintaining the practical focus that defines the M2 family.

Competitive performance on AIME 2025, GPQA Diamond, and LiveCodeBench
Efficient reasoning chains that reduce token consumption without sacrificing accuracy
Strong results on the Artificial Analysis Intelligence Index leaderboard

MiniMax-M2.5 reasoning benchmark results

Speed & Cost

Intelligence too cheap to meter

MiniMax-M2.5 delivers frontier performance at a fraction of the cost. Trained to reason efficiently, the model completes complex agentic tasks significantly faster while consuming fewer tokens per task.

37%

Faster than M2.1

Completes SWE-Bench Verified 37% faster, matching Claude Opus 4.6 speed

<$1

Per hour at 100 tok/s

Run MiniMax-M2.5 continuously at 100 tokens/second for under $1 per hour

20%

Fewer reasoning rounds

Achieves better results with ~20% fewer rounds across agentic tasks vs M2.1

Pricing

Input

$0.50 / 1M tokens

Output

$1.50 / 1M tokens

Appendix

Comprehensive benchmark results

Detailed evaluation data across coding, search, office, reasoning, and general intelligence benchmarks.

MiniMax-M2.5 extended benchmark results table 1

MiniMax-M2.5 extended benchmark results table 2

Start building with MiniMax-M2.5

Experience SOTA coding, 200K context, and architect-level planning at $0.50 / $1.50 per million tokens.

Start Chat View API Docs

Architect-level planning meets SOTA execution

Smarter decisions, fewer rounds, better results