MiniMax-M2 - AI Language Models Tool
Overview
MiniMax-M2 is an open-weight Mixture-of-Experts (MoE) language model optimized for coding and agentic tool use. The model exposes ~230B total parameters while activating roughly 10B parameters per request—trading raw scale for lower latency and improved unit economics in agent loops. MiniMax-M2 was released with long-context support, explicit tool/function-calling capabilities, and deployment guides for vLLM, SGLang, MLX, and Hugging Face Transformers, enabling both local hosting and hosted API use. ([huggingface.co](https://huggingface.co/MiniMaxAI/MiniMax-M2)) MiniMax-M2 targets end-to-end developer workflows: multi-file edits, compile-run-fix loops, test-validated repairs, and multi-step planning across shell, browser, retrieval, and code runners. The model’s authors report strong performance on coding and agentic benchmarks (SWE-bench, Terminal-Bench, BrowseComp and other agent suites), and provide inference recommendations and an interleaved "thinking" format (assistant thinking wrapped in <think>...</think>) to preserve internal planning state across turns. The Hugging Face release includes safetensors weights, Transformers integration, and documentation for deploying with popular inference stacks. ([huggingface.co](https://huggingface.co/MiniMaxAI/MiniMax-M2))
Model Statistics
- Downloads: 510,859
- Likes: 1484
- Pipeline: text-generation
License: other
Model Details
Architecture and configuration: MiniMax-M2 is a MoE-style causal decoder designed for high-throughput agentic workloads. Public configuration values exposed in the Transformers docs include: hidden_size ~3072, intermediate_size ~1536, num_hidden_layers 62, num_attention_heads 48, head_dim 128, num_experts_per_tok 8 and num_local_experts 256. The model supports sliding-window attention with max_position_embeddings up to 196,608 tokens (long-context operation). These parameters reflect the model’s design choices to balance per-request compute (≈10B active params) with a large global parameter count (≈230B). ([huggingface.co](https://huggingface.co/docs/transformers/en/model_doc/minimax_m2)) Tooling and deployment: the authors provide official guides and recommend SGLang, vLLM, MLX-LM and Hugging Face Transformers for serving and inference (day‑0 support and recipes). The model is shipped in safetensors formats and includes quantized/merged variants and community-maintained quantizations and adapters. The model expects callers that preserve the model’s interleaved thinking tokens (<think>...</think>) to maintain performance in multi-step planning and tool chains. The model card and GitHub repository contain further deployment instructions and example tool-calling templates. ([huggingface.co](https://huggingface.co/MiniMaxAI/MiniMax-M2))
Key Features
- Mixture-of-Experts: 230B total params with ~10B active per request for lower latency.
- Designed for coding: multi-file edits, compile-run-fix loops, and test-validated repairs.
- Agent/tool calling: native support for function/tool calling and long-horizon tool chains.
- Ultra-long context: sliding-window attention up to ~196,608 position embeddings for large contexts.
- Multiple deployment recipes: official guides for SGLang, vLLM, MLX-LM and Transformers.
Example Usage
Example (python):
from transformers import AutoModelForCausalLM, AutoTokenizer
# Load MiniMax-M2 (requires sufficient GPU memory or device_map="auto")
model = AutoModelForCausalLM.from_pretrained(
"MiniMaxAI/MiniMax-M2",
device_map="auto",
)
tokenizer = AutoTokenizer.from_pretrained("MiniMaxAI/MiniMax-M2")
messages = [
{"role": "user", "content": "Write a unit test for a Python function that reverses strings."}
]
# Apply the chat template (preserves assistant thinking if present) and generate
inputs = tokenizer.apply_chat_template(messages, return_tensors="pt", add_generation_prompt=True).to("cuda")
outputs = model.generate(**inputs, max_new_tokens=256)
print(tokenizer.batch_decode(outputs)[0])
# Notes: MiniMax-M2 uses an interleaved thinking format (<think>...</think>) for multi-step planning.
# See the model card and Transformers docs for additional generation/inference parameters and tool-calling guidance. ([huggingface.co](https://huggingface.co/docs/transformers/en/model_doc/minimax_m2)) Benchmarks
SWE-bench (Verified): 69.4 (Source: ([huggingface.co](https://huggingface.co/MiniMaxAI/MiniMax-M2)))
Terminal-Bench: 46.3 (Source: ([huggingface.co](https://huggingface.co/MiniMaxAI/MiniMax-M2)))
BrowseComp: 44.0 (Source: ([huggingface.co](https://huggingface.co/MiniMaxAI/MiniMax-M2)))
LiveCodeBench (LCB): 83 (Source: ([huggingface.co](https://huggingface.co/MiniMaxAI/MiniMax-M2)))
AA Intelligence (composite): 61 (AA Intelligence) (Source: ([huggingface.co](https://huggingface.co/MiniMaxAI/MiniMax-M2)))
Key Information
- Category: Language Models
- Type: AI Language Models Tool