Llama-3.1-Tulu-3-8B - AI Language Models Tool
Overview
Llama-3.1-Tulu-3-8B is an 8-billion-parameter instruction-following model from the Allen Institute for AI (Tülu3 family), built on Meta's Llama 3.1 8B base and released with open training recipes, data, and evaluation code. The Tülu3 lineup was designed to demonstrate modern post-training techniques (SFT, DPO, and RL-based stages) and to provide reproducible recipes and benchmarks for a broad set of NLP skills including chat, reasoning, math, and code. ([arxiv.org](https://arxiv.org/abs/2411.15124)) The model family includes intermediate checkpoints (SFT and DPO), a reward model, and final RL-tuned versions; AllenAI also published a v3.1 update where the RL stage switched from PPO to GRPO (no separate reward model) and reported improved final-stage performance. Tülu3 emphasizes transparency: the paper, training code, evaluation suite, and deployment notes are available alongside the weights, and the Hugging Face model card provides usage tips (transformers, vLLM, chat templates) and hyperparameter details. ([arxiv.org](https://arxiv.org/abs/2411.15124))
Model Statistics
- Downloads: 3,882
- Likes: 177
- Pipeline: text-generation
- Parameters: 8.0B
License: llama3.1
Model Details
Architecture and lineage: Llama-3.1-Tulu-3-8B is a post-trained (finetuned) variant of meta-llama/Llama-3.1-8B (8B parameters), distributed in BF16 safetensors form and intended for research and instructional use under the Llama 3.1 Community License. The published family includes SFT, DPO, and RLVR/RLVR-like final models; the 3.1 update replaced PPO with GRPO for the final RL stage on the 3.1-8B release. ([huggingface.co](https://huggingface.co/allenai/Llama-3.1-Tulu-3-8B)) Training recipe and capabilities: AllenAI documents the post-training pipeline: supervised fine-tuning (SFT), Direct Preference Optimization (DPO), and Reinforcement Learning with Verifiable Rewards (RLVR) — with explicit hyperparameters for each stage (example: SFT max sequence length 4096; DPO max sequence length 2048; PPO/RLVR hyperparameters for RL stage are published on the model card). The Tülu3 paper and model card list benchmarks (MMLU, GSM8K, MATH, HumanEval, IFEval) and provide comparisons to other instruct models. The project also releases evaluation code and data-decontamination tools so researchers can reproduce the reported scores. ([arxiv.org](https://arxiv.org/abs/2411.15124))
Key Features
- 8B-parameter Llama 3.1-based instruction-following model with BF16 safetensors.
- Open training recipes (SFT, DPO, RLVR) and complete evaluation code released by AllenAI.
- Strong performance on GSM8K and reasoning benchmarks compared to other 8B instruct models.
- Multiple checkpoints available (SFT, DPO, Reward Model, RL final) for research ablation.
- Deployment notes for Transformers and vLLM; chat template and tokenizer chat integration.
Example Usage
Example (python):
from transformers import AutoTokenizer, AutoModelForCausalLM, TextGenerationPipeline
# Load model and tokenizer (requires access under Llama 3.1 license)
model_id = "allenai/Llama-3.1-Tulu-3-8B"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, device_map="auto")
pipe = TextGenerationPipeline(model=model, tokenizer=tokenizer)
prompt = "Write a brief explanation of how gradient descent works, suitable for a college student."
out = pipe(prompt, max_new_tokens=256, do_sample=False)
print(out[0]["generated_text"]) # Hugging Face model card provides loading and vLLM usage notes.
# For production or high-throughput serving, AllenAI recommends vLLM or dedicated inference stacks; see model card for vllm tips. ([huggingface.co](https://huggingface.co/allenai/Llama-3.1-Tulu-3-8B)) Benchmarks
Aggregate benchmark (Tülu3 8B average): 64.8 (avg. reported over published benchmark suite) (Source: https://huggingface.co/allenai/Llama-3.1-Tulu-3-8B)
GSM8K (8-shot, CoT) — Tülu3 8B: 87.6 (Source: https://huggingface.co/allenai/Llama-3.1-Tulu-3-8B)
MMLU (0-shot, CoT) — Tülu3 8B: 68.2 (Source: https://huggingface.co/allenai/Llama-3.1-Tulu-3-8B)
HumanEval (pass@10) — Tülu3 8B: 83.9 (Source: https://huggingface.co/allenai/Llama-3.1-Tulu-3-8B)
MATH (4-shot, CoT) — Tülu3 8B: 43.7 (Source: https://huggingface.co/allenai/Llama-3.1-Tulu-3-8B)
Key Information
- Category: Language Models
- Type: AI Language Models Tool