Smaug-72B-v0.1 - AI Language Models Tool

Overview

Smaug-72B-v0.1 is an open-source 72-billion-parameter language model released by Abacus.AI in early 2024. It is a fine-tuned variant ultimately derived from Qwen-72B and the MoMo-72B LoRA checkpoint, and it was optimized using a new preference‑tuning method called DPO‑Positive (DPOP). Abacus.AI published the model and accompanying paper describing DPOP and the training procedure; the Hugging Face model card reports that Smaug-72B became the first open-source model to surpass an average score of 80 on the Hugging Face Open LLM Leaderboard. ([huggingface.co](https://huggingface.co/abacusai/Smaug-72B-v0.1)) Smaug targets stronger performance on reasoning and math-style tasks (GSM8K, MMLU, ARC, HellaSwag) through pairwise-preference fine‑tuning datasets and DPOP. The model card and project repository provide per-benchmark results (average 80.48 across major benchmarks) and MT‑Bench conversational scores (first‑turn 8.18, second‑turn 7.34, average 7.76). At the same time, community discussion highlights mixed qualitative feedback: some users praise benchmark improvements while others have raised concerns about overalignment, repetitive outputs, or usability in open-ended chat settings. Abacus.AI and the project repo document the training details and release artifacts. ([huggingface.co](https://huggingface.co/abacusai/Smaug-72B-v0.1))

Model Statistics

  • Downloads: 7,427
  • Likes: 467
  • Pipeline: text-generation
  • Parameters: 72.3B

License: other

Model Details

Architecture and provenance: Smaug-72B-v0.1 is a 72.3B-parameter causal language model (model card lists 72B params) that uses BF16 tensors and is distributed as safetensors. The model is a direct fine‑tune of moreh/MoMo-72B-lora-1.8.7-DPO and is ultimately based on Qwen-72B; Abacus.AI describes the chain of base model → LoRA SFT → DPOP tuning in the model card and repository. ([huggingface.co](https://huggingface.co/abacusai/Smaug-72B-v0.1)) DPO‑Positive (DPOP): the key methodological novelty is DPO‑Positive, a modification of Direct Preference Optimization that adds a term to avoid decreasing the likelihood of preferred completions in low‑edit‑distance pairings (important for math and reasoning tasks). The authors provide theoretical analysis and empirical comparisons in their arXiv paper and repo. The Smaug releases (34B and 72B) were produced with this method. ([arxiv.org](https://arxiv.org/abs/2402.13228?utm_source=openai)) Format and usage notes: the Hugging Face card indicates the model is available in safetensors BF16 format and includes evaluation artifacts (contamination checks, MT‑Bench samples). The model is released under the Tongyi Qianwen license agreement as noted on the model page; users should review that license before deploying. ([huggingface.co](https://huggingface.co/abacusai/Smaug-72B-v0.1))

Key Features

  • DPO‑Positive fine‑tuning to avoid likelihood reduction of preferred completions.
  • Strong closed‑bench performance (average 80.48 across major benchmarks).
  • High reasoning/math scores (notably GSM8K: 78.70).
  • Distributed as safetensors BF16; 72B parameters for large-context generation.
  • Open release with paper, repository, and evaluation artifacts for reproducibility.

Example Usage

Example (python):

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# This example shows how to load the model from the Hugging Face hub.
# Loading a 72B model locally requires multiple GPUs or offload infrastructure.

model_id = "abacusai/Smaug-72B-v0.1"

# Tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_id, use_fast=True)

# Model (use device_map and appropriate dtype for your hardware)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,   # model card reports BF16
    device_map="auto",
    low_cpu_mem_usage=True
)

prompt = "Explain the key idea behind DPO‑Positive in two sentences."
inputs = tokenizer(prompt, return_tensors="pt")

# Generate (example settings — tune for your environment)
generate_ids = model.generate(**{k: v.to(model.device) for k, v in inputs.items()}, max_new_tokens=200, temperature=0.7)
print(tokenizer.decode(generate_ids[0], skip_special_tokens=True))

# Note: check the model license and hardware requirements before deploying.
# See the model card on Hugging Face for evaluation examples and license details.

Benchmarks

Hugging Face Open LLM Leaderboard — average (ARC, HellaSwag, MMLU, TruthfulQA, Winogrande, GSM8K): 80.48 (average) (Source: https://huggingface.co/abacusai/Smaug-72B-v0.1)

ARC: 76.02 (Source: https://huggingface.co/abacusai/Smaug-72B-v0.1)

HellaSwag: 89.27 (Source: https://huggingface.co/abacusai/Smaug-72B-v0.1)

MMLU: 77.15 (Source: https://huggingface.co/abacusai/Smaug-72B-v0.1)

GSM8K: 78.70 (Source: https://huggingface.co/abacusai/Smaug-72B-v0.1)

MT‑Bench (LLama-2 conv template, Qwen system prompt): First turn 8.18, Second turn 7.34, Average 7.76 (Source: https://huggingface.co/abacusai/Smaug-72B-v0.1)

Downloads (last month): 7,513 (reported on model page) (Source: https://huggingface.co/abacusai/Smaug-72B-v0.1)

Likes / Followers on model page: 467 likes (Source: https://huggingface.co/abacusai/Smaug-72B-v0.1)

Last Refreshed: 2026-01-09

Key Information

  • Category: Language Models
  • Type: AI Language Models Tool