Kimi-K2-Instruct - AI Language Models Tool
Overview
Kimi-K2-Instruct is Moonshot AI’s instruction-tuned variant of the Kimi K2 Mixture‑of‑Experts (MoE) family: a 1‑trillion parameter architecture that activates ≈32B parameters per forward pass and is optimized for chat, coding agents, and tool calling. The model card and documentation emphasize agentic workflows (autonomous tool calls, function/schema-style tool definitions) and provide OpenAI/Anthropic‑compatible API examples plus block‑FP8 safetensors weights on Hugging Face for self‑hosting and third‑party serving. ([huggingface.co](https://huggingface.co/moonshotai/Kimi-K2-Instruct)) Kimi‑K2‑Instruct is positioned for production agent use: instruction‑tuned prompts and a chat template, guide for tool calling (automatic tool selection and tool_call loop), and recommended runtimes such as vLLM, SGLang, KTransformers, and TensorRT‑LLM. Benchmarks included in the model card show strong coding, math, and tool‑use performance; independent tech press coverage highlights the model’s agentic strengths and SOTA benchmark placements among recent open models. Community reports indicate excellent coding and long‑context behavior in many cases, with some users noting variability in style and safety steering depending on deployment and presets. ([huggingface.co](https://huggingface.co/moonshotai/Kimi-K2-Instruct))
Model Statistics
- Downloads: 60,794
- Likes: 2297
- Pipeline: text-generation
- Parameters: 1026.5B
License: other
Model Details
Architecture and core specs: Kimi‑K2 is a sparse Mixture‑of‑Experts transformer with ~1T total parameters and ≈32B activated parameters per token. The public model summary lists 61 layers, a global hidden dimension of 7168, an expert hidden dimension of 2048, 384 experts with 8 experts selected per token, 64 attention heads, a 160k vocabulary, and a 128k token context window for the standard Kimi‑K2 variants. Activation function is SwiGLU and the model uses a Multi‑head Latent Attention (MLA) variant. ([huggingface.co](https://huggingface.co/moonshotai/Kimi-K2-Instruct)) Training and tooling: Moonshot reports pretraining on ~15.5T tokens using the custom “Muon” / MuonClip optimizer family to stabilize trillion‑scale MoE training. The Instruct checkpoint is post‑trained for instruction following, tool calling, and short/reflex‑grade chat (the K2‑0905 variant expands capabilities and context up to reported 256k in later releases). Checkpoints on Hugging Face are provided in block‑FP8 formats (F8_E4M3 / safetensors), enabling lower‑memory self‑hosting and faster third‑party serving. Recommended inference engines and production runtimes include vLLM, SGLang, KTransformers, and TensorRT‑LLM. ([huggingface.co](https://huggingface.co/moonshotai/Kimi-K2-Instruct))
Key Features
- Trillion‑parameter MoE with ~32B activated parameters per inference.
- Instruction‑tuned Instruct checkpoint optimized for chat and agentic tool use.
- Large context: standard 128k tokens (K2‑0905 variants report up to 256k).
- Block‑FP8 safetensors checkpoints available on Hugging Face for efficient hosting.
- OpenAI/Anthropic‑compatible API surface for drop‑in integrations.
- Built for tool calling: automatic tool selection and function‑style tool schemas.
Example Usage
Example (python):
import requests
import json
# Example: OpenAI-compatible chat + automatic tool calling to Moonshot's platform
# Replace MOONSHOT_API_KEY with your key. Endpoint base taken from the model card (platform.moonshot.ai).
API_KEY = "MOONSHOT_API_KEY"
MODEL = "moonshotai/Kimi-K2-Instruct"
ENDPOINT = "https://platform.moonshot.ai/v1/chat/completions"
headers = {
"Authorization": f"Bearer {API_KEY}",
"Content-Type": "application/json"
}
# Simple chat completion
messages = [
{"role": "system", "content": "You are Kimi, an assistant optimized for tool use."},
{"role": "user", "content": "Please summarize the attached repository and tell me if tests will likely pass."}
]
payload = {
"model": MODEL,
"messages": messages,
"temperature": 0.6,
"max_tokens": 400
}
r = requests.post(ENDPOINT, headers=headers, json=payload)
print(r.status_code, r.text[:1000])
# Example: tool calling (weather) — model decides when to call the tool
# Define tool schema and provide mapping for local execution
tools = [
{
"type": "function",
"function": {
"name": "get_weather",
"description": "Return current weather for a city",
"parameters": {"type": "object","required": ["city"],
"properties": {"city": {"type": "string"}}}
}
}
]
payload_tools = {
"model": MODEL,
"messages": [
{"role": "system", "content": "You are Kimi, an assistant that may call tools."},
{"role": "user", "content": "What's the weather in San Francisco? Use the tool."}
],
"tools": tools,
"tool_choice": "auto",
"temperature": 0.6
}
resp = requests.post(ENDPOINT, headers=headers, json=payload_tools)
resp_json = resp.json()
# Server response will include tool_call nodes when model requests a tool; implement mapping to execute them locally
print(json.dumps(resp_json, indent=2)[:2000])
# Note: adapt streaming, tool-call loops, and message appending per your chosen OpenAI-compatible client.
# For details follow the model's Tool Calling guide on the Hugging Face model card. Pricing
Moonshot's official public platform pricing for Kimi‑K2 is not published directly on the Hugging Face model card. Multiple third‑party inference providers publish per‑million‑token rates for hosted Kimi‑K2 variants; examples include DeepInfra (~$0.40–$0.50 input / $2.00 output per 1M tokens) and Groq (commonly listed at $1.00 input / $3.00 output per 1M tokens), with other vendors offering different blends and cached‑token discounts. Self‑hosting is possible using the FP8 safetensors weights on Hugging Face and avoids provider token fees but requires substantial GPU infrastructure. Prices vary by provider, region, and model version—check the chosen inference provider (DeepInfra, Groq, Together, etc.) or Moonshot’s platform for current rates before budgeting. ([huggingface.co](https://huggingface.co/moonshotai/Kimi-K2-Instruct))
Benchmarks
LiveCodeBench v6 (Pass@1): 53.7% (Source: https://huggingface.co/moonshotai/Kimi-K2-Instruct)
SWE‑bench Verified (Agentic, single attempt Pass@1): 65.8% (Source: https://huggingface.co/moonshotai/Kimi-K2-Instruct)
MATH‑500 (accuracy): 97.4% (Source: https://huggingface.co/moonshotai/Kimi-K2-Instruct)
GPQA‑Diamond (Avg@8): 75.1% (Source: https://huggingface.co/moonshotai/Kimi-K2-Instruct)
BrowseComp / agentic web‑search & reasoning (reported by press): 60.2% (reported in VentureBeat coverage) (Source: https://venturebeat.com/ai/moonshots-kimi-k2-thinking-emerges-as-leading-open-source-ai-outperforming)
Key Information
- Category: Language Models
- Type: AI Language Models Tool