AM-Thinking-v1 - AI Language Models Tool

Overview

AM-Thinking-v1 is an open-source, 32–33B parameter dense language model focused on improving stepwise reasoning for math, code, and multi-step logical tasks. Built on Qwen 2.5‑32B‑Base and released by the a-m-team, the model uses a "think‑then‑answer" generation pattern (reasoning enclosed in <think>...</think> tags and answers in <answer>...</answer>) and is designed to squeeze high reasoning performance from a mid‑scale dense model while remaining deployable on a single high‑memory GPU. ([huggingface.co](https://huggingface.co/a-m-team/AM-Thinking-v1)) The project combines supervised fine‑tuning (SFT) on blended math/code/chat data with a pass‑rate‑aware curation step and a two‑stage reinforcement learning (GRPO) pipeline to concentrate learning on partially‑solved problems. According to the authors, AM‑Thinking‑v1 achieves competitive benchmark scores (for example, AIME‑2024: 85.3; AIME‑2025: 74.4; LiveCodeBench: 70.3) that approach much larger Mixture‑of‑Experts models while keeping single‑card deployability. The model is distributed under an Apache‑2.0 license and provides quantized GGUF builds for local inference with llama.cpp/Ollama. ([arxiv.org](https://arxiv.org/abs/2505.08311))

Model Statistics

  • Downloads: 523
  • Likes: 199
  • Pipeline: text-generation
  • Parameters: 32.8B

License: apache-2.0

Model Details

Architecture and size: AM‑Thinking‑v1 is a dense Qwen‑family transformer derived from Qwen 2.5‑32B‑Base and published with a ~32–33B parameter footprint (BF16 training artifacts available). The Hugging Face model card and release assets list the model as a Qwen2 architecture, stored in safetensors format for the primary weights. ([huggingface.co](https://huggingface.co/a-m-team/AM-Thinking-v1)) Context, precision and quantizations: community GGUF/llama.cpp builds and LM Studio community packaging report support for very long contexts (community packages list up to 132k tokens) and multiple quantization options (2–8 bit profiles) to enable compact local inference; quantized GGUF artifacts are published alongside the base model. These quantizations reduce VRAM requirements to levels usable on consumer and single‑GPU servers. ([huggingface.co](https://huggingface.co/lmstudio-community/AM-Thinking-v1-GGUF)) Training pipeline and capabilities: the authors describe a three‑step post‑training pipeline — cold‑start supervised fine‑tuning on blended math/code/chat data, pass‑rate‑aware data curation (keep examples with pass rates between 0 and 1), and a two‑stage GRPO reinforcement learning phase (stage 1 on math/code, stage 2 removing fully‑solved items and adjusting hyperparameters). The resulting model emphasizes stepwise reasoning (explicit internal "thinking" tokens), stronger code generation, and improved math/logic abilities versus its un-finetuned base. ([huggingface.co](https://huggingface.co/a-m-team/AM-Thinking-v1)) Limitations: the model card explicitly notes limited structured function‑calling/tooling alignment, early‑stage safety alignment, and that agent/tool workflows are not a primary capability yet. Users should plan additional fine‑tuning or safety evaluation for production use in interactive agents. ([huggingface.co](https://huggingface.co/a-m-team/AM-Thinking-v1))

Key Features

  • Think‑then‑answer generation pattern with explicit <think> and <answer> tags.
  • Post‑training pipeline: SFT + pass‑rate curation + two‑stage GRPO reinforcement learning.
  • High reasoning scores on math/code benchmarks (AIME, LiveCodeBench).
  • Deployable on a single high‑memory GPU (authors report A100 80GB viability).
  • Quantized GGUF builds for local inference with llama.cpp / Ollama.
  • Open‑source Apache‑2.0 license and downloadable weights on Hugging Face.

Example Usage

Example (python):

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_name = "a-m-team/AM-Thinking-v1"

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto"
)

# Use the project-recommended chat template (model expects <think>/<answer> tags)
prompt = "Explain why the sum of interior angles in a triangle is 180 degrees."
messages = [{"role": "user", "content": prompt}]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)

inputs = tokenizer([text], return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=512)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)

# Note: this snippet follows the quick-start in the model card; for large contexts or quantized GGUF
# local inference, use a GGUF build with llama.cpp / Ollama as provided in the model files. ([huggingface.co](https://huggingface.co/a-m-team/AM-Thinking-v1))

Pricing

AM‑Thinking‑v1 is released under the Apache‑2.0 license and the model weights are freely downloadable from Hugging Face. There is no vendor pricing for the model itself; inference or hosted API usage costs depend on the chosen provider or self‑hosting hardware (GPU/CPU) and are not specified by the authors. ([huggingface.co](https://huggingface.co/a-m-team/AM-Thinking-v1))

Benchmarks

AIME 2024 score: 85.3 (Source: https://arxiv.org/abs/2505.08311)

AIME 2025 score: 74.4 (Source: https://arxiv.org/abs/2505.08311)

LiveCodeBench score: 70.3 (Source: https://arxiv.org/abs/2505.08311)

Model card reported downloads (last month): 523 (Hugging Face model page) (Source: https://huggingface.co/a-m-team/AM-Thinking-v1)

Last Refreshed: 2026-01-09

Key Information

  • Category: Language Models
  • Type: AI Language Models Tool