Kimi-K2-Instruct - AI Language Models Tool
Overview
Kimi-K2-Instruct is Moonshot AI’s instruction-tuned variant of the Kimi K2 Mixture-of-Experts (MoE) family. It combines a 1-trillion-parameter MoE backbone with sparse activation (≈32B activated parameters per forward pass) and an extended long-context window to target chat, coding, math/reasoning, and agentic tool workflows. The release emphasizes practical deployment: block-FP8 checkpoints on Hugging Face, OpenAI/Anthropic-compatible APIs, and examples for chat completion and tool-calling. Built and evaluated with a heavy focus on agentic use (autonomous tool invocation, multi-step program synthesis, and long-horizon reasoning), Kimi-K2-Instruct ships with a detailed model card and tech report describing architecture choices, training scale, and benchmark results. The model card also provides deployment recommendations (vLLM, SGLang, KTransformers, TensorRT-LLM) and sample code for chat and function/tool calling, making it suitable both for self-hosting and for using Moonshot’s hosted API. (Source: Hugging Face model card and Moonshot AI tech report.) ([huggingface.co](https://huggingface.co/moonshotai/Kimi-K2-Instruct))
Model Statistics
- Downloads: 192,938
- Likes: 2323
- Pipeline: text-generation
License: other
Model Details
Architecture: Mixture-of-Experts Transformer (MoE) with 1T total parameters and sparse activation (~32B active per request). The published model summary lists 61 total layers (1 dense), 384 experts, 8 selected experts per token, 64 attention heads, an attention hidden dimension of 7168, and per-expert hidden dim of 2048. Activation uses SwiGLU and the attention variant is listed as MLA. Vocabulary size is reported as ~160K tokens. Context window: the instruct variant supports an ultra-long context (official model card lists 128K tokens for the main K2-Instruct releases; later K2 variants expand context further). Training and optimizer: pretraining across large-scale corpora (the team reports ~15.5T training tokens) and uses the Muon/MuonClip optimizer family and scale-stability techniques described in the project tech report. Model artifacts: checkpoints are published in block-FP8 format on Hugging Face to enable efficient storage and faster inference on compatible runtimes. Deployment & runtime compatibility: Moonshot provides an OpenAI/Anthropic-compatible API surface (including an Anthropic temperature mapping note) and recommends inference engines vLLM, SGLang, KTransformers, and TensorRT-LLM for best performance. The model card includes chat and tool-calling examples and explains the requirement to pass the available tool list per request so the model can autonomously decide when to invoke tools. (Sources: Hugging Face model card; Moonshot tech report and docs.) ([huggingface.co](https://huggingface.co/moonshotai/Kimi-K2-Instruct))
Key Features
- 1 trillion-parameter MoE backbone with sparse activation (~32B active per request).
- Large expert pool: 384 experts, selecting 8 experts per token for efficient compute.
- Ultra-long context support (officially 128K tokens for K2-Instruct variants; later variants extend further).
- Instruction-tuned for chat and agentic tool use, with native function/tool-calling support.
- Block-FP8 (block-fp8) checkpoints published on Hugging Face for efficient storage and inference.
- OpenAI/Anthropic-compatible API surface and Anthropic temperature mapping guidance.
- Recommended inference runtimes: vLLM, SGLang, KTransformers, and TensorRT-LLM.
Example Usage
Example (python):
from openai import OpenAI
import json
# Example: simple chat + tool-calling loop (OpenAI-compatible client)
# Adapt the client import to your OpenAI-compatible SDK if needed.
client = OpenAI(api_key="YOUR_API_KEY")
model_name = "moonshotai/Kimi-K2-Instruct"
# Simple chat completion example
messages = [
{"role": "system", "content": "You are Kimi, an AI assistant created by Moonshot AI."},
{"role": "user", "content": "Please give a brief self-introduction."}
]
resp = client.chat.completions.create(
model=model_name,
messages=messages,
temperature=0.6,
max_tokens=256
)
print(resp.choices[0].message.content)
# Tool-calling example (local tool mapping)
def get_weather(city: str) -> dict:
return {"weather": "Sunny", "city": city}
tools = [{
"type": "function",
"function": {
"name": "get_weather",
"description": "Retrieve current weather information.",
"parameters": {"type": "object", "required": ["city"], "properties": {"city": {"type": "string"}}}
}
}]
messages = [
{"role": "system", "content": "You are Kimi, an AI assistant."},
{"role": "user", "content": "What's the weather in Beijing? Use the tool to check."}
]
finish_reason = None
while finish_reason is None or finish_reason == "tool_calls":
completion = client.chat.completions.create(
model=model_name,
messages=messages,
temperature=0.6,
tools=tools,
tool_choice="auto"
)
choice = completion.choices[0]
finish_reason = choice.finish_reason
if finish_reason == "tool_calls":
messages.append(choice.message)
for tool_call in choice.message.tool_calls:
args = json.loads(tool_call.function.arguments)
result = get_weather(**args)
messages.append({"role": "tool", "tool_call_id": tool_call.id, "name": tool_call.function.name, "content": json.dumps(result)})
print("--- final response ---")
print(choice.message.content)
# Example adapted from the model card and deployment guide on Hugging Face. (See model card for streaming and advanced patterns.) Pricing
Moonshot provides a hosted API for Kimi models (OpenAI-compatible) and has announced pricing updates on its platform. Moonshot’s blog notes reduced input pricing and tier/rate-limit changes, but per-token rates depend on plan/provider. Third-party aggregators and marketplaces list examples (input/output token rates vary; e.g., several vendors list input ~$0.10–$0.60 per 1M tokens and output ~$2.50–$3.00 per 1M tokens), and some platform-specific subscription tiers are shown on community pages. Because official, consistently published numeric rates vary by region/provider and change over time, consult the Moonshot platform docs or your chosen inference provider for exact, up-to-date billing details. (See Moonshot platform announcement and representative aggregator listings.) ([platform.moonshot.ai](https://platform.moonshot.ai/blog/posts/Kimi_API_Newsletter))
Benchmarks
LiveCodeBench v6 (Pass@1): 53.7% (Source: https://huggingface.co/moonshotai/Kimi-K2-Instruct)
SWE-bench Verified (Agentic Coding, Single Attempt Pass@1): 65.8% (Source: https://huggingface.co/moonshotai/Kimi-K2-Instruct)
MMLU (Exact Match, EM): 89.5 EM (Source: https://huggingface.co/moonshotai/Kimi-K2-Instruct)
AIME 2024 (Avg@64): 69.6 (Source: https://huggingface.co/moonshotai/Kimi-K2-Instruct)
Downloads (last month, Hugging Face): 192,938 downloads (Source: https://huggingface.co/moonshotai/Kimi-K2-Instruct)
Key Information
- Category: Language Models
- Type: AI Language Models Tool