Kimi-K2 - AI Language Models Tool

Overview

Kimi-K2 is an open-source, trillion-parameter mixture-of-experts (MoE) large language model series from Moonshot AI designed for high-end reasoning, coding, and agentic tool use. Architecturally it exposes 1.0 trillion total parameters with 32 billion activated parameters per token, a 384-expert MoE routing (8 experts selected per token), and a super-long 128K token context window. The project emphasizes stability at scale via the MuonClip optimizer and a large-scale pretraining corpus (reported at 15.5T tokens), and ships both base and post-trained (Instruct) variants optimized for drop-in chat and tool-enabled agents. ([github.com](https://github.com/MoonshotAI/Kimi-K2)) Kimi-K2 is released with model checkpoints (block-fp8 format) on Hugging Face and deployment examples targeting inference engines such as vLLM, SGLang, KTransformers and TensorRT-LLM. The project includes an agentic-data pipeline and reinforcement learning stages intended to improve multi-step tool use and autonomous problem solving; evaluation tables in the tech report show strong results on coding, math, and agentic benchmarks (e.g., LiveCodeBench, SWE-bench, Tau2/AceBench). Moonshot also offers an OpenAI/Anthropic-compatible API for hosted access and updated commercial pricing and model variants on their platform. ([huggingface.co](https://huggingface.co/moonshotai/Kimi-K2-Instruct))

GitHub Statistics

  • Stars: 10,415
  • Forks: 787
  • Contributors: 8
  • License: NOASSERTION
  • Last Updated: 2025-10-31T03:23:46Z

The GitHub repository is active and well-trafficked: the public repo shows ~10.4k stars and ~787 forks with a multi-file README, a tech report, and deployment docs. Issue activity indicates community questions and ongoing maintenance (dozens of issues). The project provides a Modified-MIT style license and links to model checkpoints on Hugging Face, which supports reproducibility and downstream research use. Recent authoritative documentation and a tech report were updated on the arXiv record (v2 revised Feb 3, 2026), indicating continued development and public research scrutiny. ([github.com](https://github.com/MoonshotAI/Kimi-K2))

Installation

Install via pip:

git clone https://github.com/MoonshotAI/Kimi-K2.git
git lfs install  # required for large checkpoint files
pip install "vllm>=0.10.0rc1" huggingface_hub blobfile
huggingface-cli login  # authenticate to download weights (if required)
huggingface-cli repo clone moonshotai/Kimi-K2-Instruct  # download model files from Hugging Face
# Example: start a local vLLM server (TP16 example from deploy guide)
vllm serve $MODEL_PATH --port 8000 --served-model-name kimi-k2 --trust-remote-code --tensor-parallel-size 16 --enable-auto-tool-choice --tool-call-parser kimi_k2
# Example: SGLang TP16 two-node launch (replace MASTER_IP, node-rank, etc.)
python -m sglang.launch_server --model-path $MODEL_PATH --tp 16 --dist-init-addr $MASTER_IP:50000 --nnodes 2 --node-rank 0 --trust-remote-code --tool-call-parser kimi_k2
# TensorRT-LLM and KTransformers deployment guidance is in docs/deploy_guidance.md (multi-node GPU clusters typically required).
Note: multi-node GPU clusters (H200/H800/H20/H100 families) and advanced parallelism (TP/DP+EP) are recommended for 128K context FP8 checkpoints.

Key Features

  • 1T-parameter MoE architecture with 384 experts, 8 experts selected per token.
  • 32B activated parameters per token for efficient sparse inference.
  • 128K token context window for long-document and multi-step workflows.
  • MuonClip optimizer and stability techniques for trillion-scale pretraining.
  • Post-trained 'Instruct' variant optimized for tool-calling and chat.
  • Tool-calling/agentic pipeline enabling autonomous multi-step tool use.
  • Open checkpoints in block-fp8 format on Hugging Face for local deployment.

Community

Kimi-K2 has generated strong community interest—high GitHub star/fork counts, a public tech report (arXiv), and a Hugging Face model card with ongoing changelogs—indicating research and engineering engagement. Users and commentators praise its open weights, long context, and agentic focus, and third‑party reporting highlights competitive benchmark performance. Community feedback is mixed in practice: many applaud the openness and agentic features, while some early adopters note resource complexity for deployment and occasional quality gaps on specific coding or domain tasks; support and tutorials (deployment guide, platform docs) aim to lower this friction. For hosted use, Moonshot’s platform provides API access and published pricing/variant updates; for self‑hosting, expect multi‑GPU clusters and recommended inference engines. ([github.com](https://github.com/MoonshotAI/Kimi-K2))

Last Refreshed: 2026-02-24

Key Information

  • Category: Language Models
  • Type: AI Language Models Tool