AM-Thinking-v1 - AI Language Models Tool
Overview
AM-Thinking-v1 is an open-source, reasoning-optimized dense language model developed by the a-m-team (an internal research group at Beike / Ke.com). It was built by post-training the publicly available Qwen 2.5 32B base model with a focused supervised fine-tune (SFT) and a two-stage reinforcement-learning pipeline to encourage a ‘‘think‑then‑answer’’ behavior and improve mathematical and code reasoning. The authors report strong leaderboard-style results on competition math (AIME) and coding benchmarks while keeping the model small enough to deploy on a single high-memory GPU—designed as a practical mid‑scale alternative to much larger Mixture‑of‑Experts (MoE) systems. ([ar5iv.org](https://ar5iv.org/pdf/2505.08311)) AM‑Thinking‑v1 targets use cases that require multi-step reasoning such as code generation with verifiable test cases, mathematical problem solving, and analytic writing. The release includes full model artifacts (safetensors shards and tokenizer), quantized GGUF builds for local inference, and accompanying technical reports and distilled datasets intended to support reproducible research and downstream distillation. The model is released under an Apache‑2.0 license and the project publishes an arXiv paper and supporting repos/documentation. ([huggingface.co](https://huggingface.co/a-m-team/AM-Thinking-v1))
Model Statistics
- Downloads: 144
- Likes: 203
- Pipeline: text-generation
- Parameters: 32.8B
License: apache-2.0
Model Details
Architecture and size: AM‑Thinking‑v1 is presented as a 32B‑scale dense causal language model derived from Qwen 2.5‑32B base. The Hugging Face model card lists the stored checkpoint shards and a model size entry (33B params displayed on the HF page) while the paper describes the model as a 32B dense model; users should expect a mid‑30B parameter checkpoint in safetensors shard form (BF16 tensors). ([ar5iv.org](https://ar5iv.org/pdf/2505.08311)) Post‑training pipeline: The developers apply a three‑step post‑training pipeline: (1) cold‑start supervised fine‑tuning on a blended math/code/chat mixture to induce a think‑then‑answer style, (2) pass‑rate‑aware curation that keeps problems with intermediate pass rates (0 < pass < 1) to focus learning on informative examples, and (3) a two‑stage RL (GRPO) procedure where Stage 1 concentrates on math and code, and Stage 2 refines training by removing items the model solved completely and adjusting hyperparameters. Rewarding for verifiable tasks uses rule‑based verification and code execution; math answers are validated with math_verify. These pipeline details are described in the project paper and model card. ([ar5iv.org](https://ar5iv.org/pdf/2505.08311)) Data and evaluation: All training queries are drawn from publicly available datasets (math, code, scientific reasoning, instruction‑following, and general chat). The team reports extensive deduplication, decontamination from evaluation sets, synthetic response filtering (perplexity and n‑gram filters), and ground‑truth verification for math and code examples. The release also includes distilled reasoning datasets and additional technical reports on distillation and staged RL. ([ar5iv.org](https://ar5iv.org/pdf/2505.08311)) Deployment and formats: Model artifacts are provided as safetensors shards and tokenizer files on Hugging Face; the card lists quantized GGUF variants for llama.cpp/Ollama local inference. The authors state the model can be run on a single A100‑80GB with deterministic latency (no MoE routing). Tensor type indicated on HF is BF16. ([huggingface.co](https://huggingface.co/a-m-team/AM-Thinking-v1)) Limitations & safety: The authors note the model was not trained specifically for structured function‑calling or tool‑use agents and that safety alignment is early-stage; further red‑teaming and alignment work is advised before production deployment. Community discussions also raise questions about dataset release and verification practices. ([huggingface.co](https://huggingface.co/a-m-team/AM-Thinking-v1))
Key Features
- 32B‑scale dense model optimized for multi‑step mathematical and coding reasoning.
- Post‑training pipeline: cold‑start SFT, pass‑rate-aware curation, and two‑stage RL (GRPO).
- Verifiable rewards: math_verify for math, sandboxed test execution for code verification.
- Deployable on a single high‑memory GPU; quantized GGUF builds for local inference.
- Open artifacts and distilled reasoning datasets released for reproducibility and distillation.
Example Usage
Example (python):
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
model_name = "a-m-team/AM-Thinking-v1"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype=torch.bfloat16 if torch.cuda.is_available() else torch.float32,
device_map="auto"
)
prompt = "How can I find inner peace?"
messages = [{"role": "user", "content": prompt}]
# The model card indicates a chat template and a system prompt used during SFT/RL.
# Use tokenizer.apply_chat_template to produce the expected formatted input.
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer([text], return_tensors="pt").to(model.device)
# Generate (tune max_new_tokens and decoding to your needs)
generated = model.generate(**inputs, max_new_tokens=512)
output_ids = generated[0][len(inputs.input_ids[0]):].tolist()
response = tokenizer.decode(output_ids, skip_special_tokens=True)
# AM‑Thinking outputs include <think> and <answer> regions per the model card
think = response.split("<think>")[1].split("</think>")[0] if "<think>" in response else None
answer = response.split("<answer>")[1].split("</answer>")[0] if "<answer>" in response else response
print("--- USER PROMPT ---\n", prompt)
print("--- MODEL THINK ---\n", think)
print("--- MODEL ANSWER ---\n", answer) Benchmarks
AIME 2024 (math competition benchmark): 85.3 (Source: https://arxiv.org/abs/2505.08311)
AIME 2025 (math competition benchmark): 74.4 (Source: https://arxiv.org/abs/2505.08311)
LiveCodeBench (code generation benchmark): 70.3 (Source: https://arxiv.org/abs/2505.08311)
Hugging Face: downloads last month: 144 (downloads last month) (Source: https://huggingface.co/a-m-team/AM-Thinking-v1)
Model card reported tensor type / checkpoint format: BF16 safetensors shards (multi‑file checkpoint) (Source: https://huggingface.co/a-m-team/AM-Thinking-v1)
Key Information
- Category: Language Models
- Type: AI Language Models Tool