MiniMax-M2 - AI Language Models Tool

Overview

MiniMax-M2 is an open-weight Mixture-of-Experts (MoE) language model optimized for coding and agentic tool use. The model exposes ~230B total parameters while activating roughly 10B parameters per request—trading raw scale for lower latency and improved unit economics in agent loops. MiniMax-M2 was released with long-context support, explicit tool/function-calling capabilities, and deployment guides for vLLM, SGLang, MLX, and Hugging Face Transformers, enabling both local hosting and hosted API use. ([huggingface.co](https://huggingface.co/MiniMaxAI/MiniMax-M2)) MiniMax-M2 targets end-to-end developer workflows: multi-file edits, compile-run-fix loops, test-validated repairs, and multi-step planning across shell, browser, retrieval, and code runners. The model’s authors report strong performance on coding and agentic benchmarks (SWE-bench, Terminal-Bench, BrowseComp and other agent suites), and provide inference recommendations and an interleaved "thinking" format (assistant thinking wrapped in <think>...</think>) to preserve internal planning state across turns. The Hugging Face release includes safetensors weights, Transformers integration, and documentation for deploying with popular inference stacks. ([huggingface.co](https://huggingface.co/MiniMaxAI/MiniMax-M2))

Model Statistics

  • Downloads: 510,859
  • Likes: 1484
  • Pipeline: text-generation

License: other

Model Details

Architecture and configuration: MiniMax-M2 is a MoE-style causal decoder designed for high-throughput agentic workloads. Public configuration values exposed in the Transformers docs include: hidden_size ~3072, intermediate_size ~1536, num_hidden_layers 62, num_attention_heads 48, head_dim 128, num_experts_per_tok 8 and num_local_experts 256. The model supports sliding-window attention with max_position_embeddings up to 196,608 tokens (long-context operation). These parameters reflect the model’s design choices to balance per-request compute (≈10B active params) with a large global parameter count (≈230B). ([huggingface.co](https://huggingface.co/docs/transformers/en/model_doc/minimax_m2)) Tooling and deployment: the authors provide official guides and recommend SGLang, vLLM, MLX-LM and Hugging Face Transformers for serving and inference (day‑0 support and recipes). The model is shipped in safetensors formats and includes quantized/merged variants and community-maintained quantizations and adapters. The model expects callers that preserve the model’s interleaved thinking tokens (<think>...</think>) to maintain performance in multi-step planning and tool chains. The model card and GitHub repository contain further deployment instructions and example tool-calling templates. ([huggingface.co](https://huggingface.co/MiniMaxAI/MiniMax-M2))

Key Features

  • Mixture-of-Experts: 230B total params with ~10B active per request for lower latency.
  • Designed for coding: multi-file edits, compile-run-fix loops, and test-validated repairs.
  • Agent/tool calling: native support for function/tool calling and long-horizon tool chains.
  • Ultra-long context: sliding-window attention up to ~196,608 position embeddings for large contexts.
  • Multiple deployment recipes: official guides for SGLang, vLLM, MLX-LM and Transformers.

Example Usage

Example (python):

from transformers import AutoModelForCausalLM, AutoTokenizer

# Load MiniMax-M2 (requires sufficient GPU memory or device_map="auto")
model = AutoModelForCausalLM.from_pretrained(
    "MiniMaxAI/MiniMax-M2",
    device_map="auto",
)

tokenizer = AutoTokenizer.from_pretrained("MiniMaxAI/MiniMax-M2")

messages = [
    {"role": "user", "content": "Write a unit test for a Python function that reverses strings."}
]

# Apply the chat template (preserves assistant thinking if present) and generate
inputs = tokenizer.apply_chat_template(messages, return_tensors="pt", add_generation_prompt=True).to("cuda")
outputs = model.generate(**inputs, max_new_tokens=256)
print(tokenizer.batch_decode(outputs)[0])

# Notes: MiniMax-M2 uses an interleaved thinking format (<think>...</think>) for multi-step planning.
# See the model card and Transformers docs for additional generation/inference parameters and tool-calling guidance. ([huggingface.co](https://huggingface.co/docs/transformers/en/model_doc/minimax_m2))

Benchmarks

SWE-bench (Verified): 69.4 (Source: ([huggingface.co](https://huggingface.co/MiniMaxAI/MiniMax-M2)))

Terminal-Bench: 46.3 (Source: ([huggingface.co](https://huggingface.co/MiniMaxAI/MiniMax-M2)))

BrowseComp: 44.0 (Source: ([huggingface.co](https://huggingface.co/MiniMaxAI/MiniMax-M2)))

LiveCodeBench (LCB): 83 (Source: ([huggingface.co](https://huggingface.co/MiniMaxAI/MiniMax-M2)))

AA Intelligence (composite): 61 (AA Intelligence) (Source: ([huggingface.co](https://huggingface.co/MiniMaxAI/MiniMax-M2)))

Last Refreshed: 2026-02-24

Key Information

  • Category: Language Models
  • Type: AI Language Models Tool