Smaug-72B-v0.1 - AI Language Models Tool
Overview
Smaug-72B-v0.1 is an open-source 72-billion-parameter language model released by Abacus.AI that focuses on high-quality reasoning, math, and multi-turn text generation. The release accompanies a paper that introduces DPO-Positive (DPOP), a modification of Direct Preference Optimization designed to avoid a likelihood-collapse failure mode in preference fine-tuning; using DPOP the authors release Smaug-34B and Smaug-72B and report strong benchmark wins. According to the model card and paper, Smaug-72B is finetuned from moreh/MoMo-72B-lora-1.8.7-DPO and ultimately based on Qwen-72B, and it became the first open-weight model to exceed an average score of 80% on the Hugging Face Open LLM Leaderboard. The model is distributed as BF16 safetensors and is intended for research, benchmarking, and deployment where high-quality reasoning and preference-aligned outputs are required (see the Abacus.AI model card and the Smaug arXiv paper for training and evaluation details).
Model Statistics
- Downloads: 7,478
- Likes: 467
- Pipeline: text-generation
License: other
Model Details
Architecture & base: Smaug-72B-v0.1 is a 72B-parameter causal language model released as BF16 safetensors on Hugging Face. The model card explicitly states the model was finetuned from moreh/MoMo-72B-lora-1.8.7-DPO and is ultimately based on Qwen-72B. Fine-tuning method: the authors propose DPO-Positive (DPOP), a preference-optimization loss that preserves or raises the likelihood of preferred completions while still increasing pairwise preference margins; the method and training datasets are described in the arXiv paper. Quantization & files: weights are distributed as safetensors (multiple parts; total ~145 GB on the Hugging Face files tab). Format & license: model files are BF16 safetensors and the model card lists the license as "tongyi-qianwen-license-agreement". Intended capabilities: high-quality multi-turn dialogue, mathematical and logical reasoning, code generation and question answering, with targeted fine-tuning on pairwise preference datasets (ARC, HellaSwag, MetaMath and others). Practical notes: running Smaug-72B requires large-memory TPU/GPU or an inference provider; the Hugging Face card notes the model is not (as of the card) deployed by any inference provider.
Key Features
- DPO-Positive (DPOP) preference-fine-tuning that preserves preferred-completion likelihoods
- First open-weight model reported to surpass 80% average on the Hugging Face Open LLM Leaderboard
- 72B-parameter BF16 safetensors release, finetuned from moreh/MoMo-72B-lora-1.8.7-DPO
- Strong multi-turn and reasoning performance (MT-Bench first-turn 8.18; average 7.76)
- Targeted improvements on math/reasoning benchmarks (MetaMath, ARC, GSM8K)
Example Usage
Example (python):
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
model_id = "abacusai/Smaug-72B-v0.1"
# NOTE: Smaug-72B is a 72B-parameter model. Loading requires GPUs/TPUs or an inference service.
# This example shows the local Hugging Face Transformers pattern; for real use you may prefer
# an inference provider or model-parallel setup (accelerate, DeepSpeed, or text-generation-inference).
tokenizer = AutoTokenizer.from_pretrained(model_id, use_fast=False, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
model_id,
device_map="auto", # or specify device_map for your setup
torch_dtype=torch.bfloat16, # weights are BF16
trust_remote_code=True,
low_cpu_mem_usage=True
)
prompt = "Explain the Monty Hall problem and show a short simulation in Python."
inputs = tokenizer(prompt, return_tensors="pt")
# Ensure input tensors are on the same devices as the model
inputs = {k: v.to(next(model.parameters()).device) for k, v in inputs.items()}
with torch.inference_mode():
outputs = model.generate(**inputs, max_new_tokens=300, do_sample=False)
print(tokenizer.decode(outputs[0], skip_special_tokens=True)) Benchmarks
Open LLM Leaderboard — average (ARC, HellaSwag, MMLU, TruthfulQA, Winogrande, GSM8K): 80.48 (average across listed benchmarks) (Source: https://huggingface.co/abacusai/Smaug-72B-v0.1)
Breakdown — ARC: 76.02 (Source: https://huggingface.co/abacusai/Smaug-72B-v0.1)
Breakdown — HellaSwag: 89.27 (Source: https://huggingface.co/abacusai/Smaug-72B-v0.1)
Breakdown — MMLU: 77.15 (Source: https://huggingface.co/abacusai/Smaug-72B-v0.1)
Breakdown — TruthfulQA: 76.67 (Source: https://huggingface.co/abacusai/Smaug-72B-v0.1)
Breakdown — Winogrande: 85.08 (Source: https://huggingface.co/abacusai/Smaug-72B-v0.1)
Breakdown — GSM8K: 78.70 (Source: https://huggingface.co/abacusai/Smaug-72B-v0.1)
MT-Bench (GPT-4 judge) — First turn / Second turn / Average: 8.18 / 7.34 / 7.76 (single-model mode, llama-2 conversation template with Qwen system prompt) (Source: https://huggingface.co/abacusai/Smaug-72B-v0.1)
Contamination scores (code/text overlap reference suite): ARC: 0.20, TruthfulQA: 0.45, GSM8K: 1.00 (comparison numbers provided on the model card) (Source: https://huggingface.co/abacusai/Smaug-72B-v0.1)
Paper (method & claims): DPO-Positive (DPOP) introduced; DPOP outperforms DPO on several datasets and enables Smaug-72B's leaderboard performance (Source: https://arxiv.org/abs/2402.13228)
Key Information
- Category: Language Models
- Type: AI Language Models Tool