Smaug-72B-v0.1 - AI Language Models Tool

Overview

Smaug-72B-v0.1 is an open-source 72-billion-parameter language model released by Abacus.AI that focuses on high-quality reasoning, math, and multi-turn text generation. The release accompanies a paper that introduces DPO-Positive (DPOP), a modification of Direct Preference Optimization designed to avoid a likelihood-collapse failure mode in preference fine-tuning; using DPOP the authors release Smaug-34B and Smaug-72B and report strong benchmark wins. According to the model card and paper, Smaug-72B is finetuned from moreh/MoMo-72B-lora-1.8.7-DPO and ultimately based on Qwen-72B, and it became the first open-weight model to exceed an average score of 80% on the Hugging Face Open LLM Leaderboard. The model is distributed as BF16 safetensors and is intended for research, benchmarking, and deployment where high-quality reasoning and preference-aligned outputs are required (see the Abacus.AI model card and the Smaug arXiv paper for training and evaluation details).

Model Statistics

  • Downloads: 7,478
  • Likes: 467
  • Pipeline: text-generation

License: other

Model Details

Architecture & base: Smaug-72B-v0.1 is a 72B-parameter causal language model released as BF16 safetensors on Hugging Face. The model card explicitly states the model was finetuned from moreh/MoMo-72B-lora-1.8.7-DPO and is ultimately based on Qwen-72B. Fine-tuning method: the authors propose DPO-Positive (DPOP), a preference-optimization loss that preserves or raises the likelihood of preferred completions while still increasing pairwise preference margins; the method and training datasets are described in the arXiv paper. Quantization & files: weights are distributed as safetensors (multiple parts; total ~145 GB on the Hugging Face files tab). Format & license: model files are BF16 safetensors and the model card lists the license as "tongyi-qianwen-license-agreement". Intended capabilities: high-quality multi-turn dialogue, mathematical and logical reasoning, code generation and question answering, with targeted fine-tuning on pairwise preference datasets (ARC, HellaSwag, MetaMath and others). Practical notes: running Smaug-72B requires large-memory TPU/GPU or an inference provider; the Hugging Face card notes the model is not (as of the card) deployed by any inference provider.

Key Features

  • DPO-Positive (DPOP) preference-fine-tuning that preserves preferred-completion likelihoods
  • First open-weight model reported to surpass 80% average on the Hugging Face Open LLM Leaderboard
  • 72B-parameter BF16 safetensors release, finetuned from moreh/MoMo-72B-lora-1.8.7-DPO
  • Strong multi-turn and reasoning performance (MT-Bench first-turn 8.18; average 7.76)
  • Targeted improvements on math/reasoning benchmarks (MetaMath, ARC, GSM8K)

Example Usage

Example (python):

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

model_id = "abacusai/Smaug-72B-v0.1"

# NOTE: Smaug-72B is a 72B-parameter model. Loading requires GPUs/TPUs or an inference service.
# This example shows the local Hugging Face Transformers pattern; for real use you may prefer
# an inference provider or model-parallel setup (accelerate, DeepSpeed, or text-generation-inference).

tokenizer = AutoTokenizer.from_pretrained(model_id, use_fast=False, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    device_map="auto",             # or specify device_map for your setup
    torch_dtype=torch.bfloat16,      # weights are BF16
    trust_remote_code=True,
    low_cpu_mem_usage=True
)

prompt = "Explain the Monty Hall problem and show a short simulation in Python."
inputs = tokenizer(prompt, return_tensors="pt")

# Ensure input tensors are on the same devices as the model
inputs = {k: v.to(next(model.parameters()).device) for k, v in inputs.items()}

with torch.inference_mode():
    outputs = model.generate(**inputs, max_new_tokens=300, do_sample=False)

print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Benchmarks

Open LLM Leaderboard — average (ARC, HellaSwag, MMLU, TruthfulQA, Winogrande, GSM8K): 80.48 (average across listed benchmarks) (Source: https://huggingface.co/abacusai/Smaug-72B-v0.1)

Breakdown — ARC: 76.02 (Source: https://huggingface.co/abacusai/Smaug-72B-v0.1)

Breakdown — HellaSwag: 89.27 (Source: https://huggingface.co/abacusai/Smaug-72B-v0.1)

Breakdown — MMLU: 77.15 (Source: https://huggingface.co/abacusai/Smaug-72B-v0.1)

Breakdown — TruthfulQA: 76.67 (Source: https://huggingface.co/abacusai/Smaug-72B-v0.1)

Breakdown — Winogrande: 85.08 (Source: https://huggingface.co/abacusai/Smaug-72B-v0.1)

Breakdown — GSM8K: 78.70 (Source: https://huggingface.co/abacusai/Smaug-72B-v0.1)

MT-Bench (GPT-4 judge) — First turn / Second turn / Average: 8.18 / 7.34 / 7.76 (single-model mode, llama-2 conversation template with Qwen system prompt) (Source: https://huggingface.co/abacusai/Smaug-72B-v0.1)

Contamination scores (code/text overlap reference suite): ARC: 0.20, TruthfulQA: 0.45, GSM8K: 1.00 (comparison numbers provided on the model card) (Source: https://huggingface.co/abacusai/Smaug-72B-v0.1)

Paper (method & claims): DPO-Positive (DPOP) introduced; DPOP outperforms DPO on several datasets and enables Smaug-72B's leaderboard performance (Source: https://arxiv.org/abs/2402.13228)

Last Refreshed: 2026-02-24

Key Information

  • Category: Language Models
  • Type: AI Language Models Tool