Llama-3.1-Tulu-3-8B - AI Language Models Tool

Overview

Llama-3.1-Tulu-3-8B is an instruction-following post-trained model from AllenAI built on Meta's Llama 3.1 base and released as part of the Tülu 3 family. The release packages supervised fine-tuning (SFT), Direct Preference Optimization (DPO), and a Reinforcement Learning with Verifiable Rewards (RLVR) stage, and is targeted at chat, mathematical reasoning, coding, and other instruction-oriented tasks. The Tülu 3 project also publishes its recipes, datasets, and evaluation tooling to support reproducible post-training research. ([arxiv.org](https://arxiv.org/abs/2411.15124?utm_source=openai)) The 8B-parameter Tülu 3 model offers multiple fine-tuned variants (SFT, DPO, final RLVR) and extensive benchmark tables and hyperparameter notes on its Hugging Face model page. Hugging Face provides usage and deployment guidance (transformers example, vLLM notes), file formats (safetensors, BF16), and license information under the Llama 3.1 Community License. Users should evaluate safety trade-offs: the model family documents limited safety training and advises careful, in‑loop filtering for production deployments. ([huggingface.co](https://huggingface.co/allenai/Llama-3.1-Tulu-3-8B?utm_source=openai))

Model Statistics

  • Downloads: 2,714
  • Likes: 178
  • Pipeline: text-generation

License: llama3.1

Model Details

Architecture and scale: the model is an 8-billion-parameter causal LLM derived from meta-llama/Llama-3.1-8B and distributed in BF16 safetensors format. Hugging Face lists the model as the Tülu 3 RLVR final for the 8B family, with separate published SFT and DPO checkpoints for comparison and research. ([huggingface.co](https://huggingface.co/allenai/Llama-3.1-Tulu-3-8B?utm_source=openai)) Training and post-training: the Tülu 3 recipe applies supervised finetuning (SFT), Direct Preference Optimization (DPO), and Reinforcement Learning with Verifiable Rewards (RLVR). The model card and Tülu 3 paper describe the training datasets, the post-training pipeline, and the release of code and data to reproduce the post-training pipeline. Detailed PPO/RL hyperparameters used for RLVR (learning rate, discount factor, PPO clipping coeff., batch size, etc.) are published on the model card. ([arxiv.org](https://arxiv.org/abs/2411.15124?utm_source=openai)) Deployment notes: Hugging Face provides a transformers load snippet and recommends vLLM for serving; the model includes a built-in chat template and practical tips (for example, vLLM --max_model_len guidance). The model is released under the Llama 3.1 Community License with AllenAI’s Responsible Use Guidelines for Tülu releases. ([huggingface.co](https://huggingface.co/allenai/Llama-3.1-Tulu-3-8B?utm_source=openai))

Key Features

  • 8B-parameter Llama 3.1–based causal language model (Tülu 3 final RLVR checkpoint).
  • Multiple post-training stages: SFT, Direct Preference Optimization (DPO), and RLVR.
  • Strong math and code performance (notable GSM8K and HumanEval scores).
  • Open release: training recipes, datasets, and evaluation code published for reproducibility.
  • Distributed in BF16 safetensors format and compatible with transformers and vLLM serving.
  • Model card includes detailed RL hyperparameters, decontamination notes, and safety guidance.

Example Usage

Example (python):

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

model_id = "allenai/Llama-3.1-Tulu-3-8B"

# Load tokenizer and model (example using BF16)
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto",
)

prompt = "<|user|>Write a short, clear explanation of Bayes' theorem.<|assistant|>"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=200, do_sample=False)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

# Note: Hugging Face model card also lists vLLM serving examples and recommends --max_model_len adjustments for long contexts. ([huggingface.co](https://huggingface.co/allenai/Llama-3.1-Tulu-3-8B?utm_source=openai))

Benchmarks

Average (composite across evals) — Tülu 3 (RLVR, 8B): 64.8 (reported composite average in model card table) (Source: ([huggingface.co](https://huggingface.co/allenai/Llama-3.1-Tulu-3-8B?utm_source=openai)))

MMLU (0-shot, CoT) — Tülu 3 (8B): 68.2 (Source: ([huggingface.co](https://huggingface.co/allenai/Llama-3.1-Tulu-3-8B?utm_source=openai)))

GSM8K (8-shot, CoT): 87.6 (Tülu 3 8B column in model card) (Source: ([huggingface.co](https://huggingface.co/allenai/Llama-3.1-Tulu-3-8B?utm_source=openai)))

MATH (4-shot, CoT, Flex): 43.7 (Tülu 3 8B; shows strong math improvement in DPO/RLVR variants) (Source: ([huggingface.co](https://huggingface.co/allenai/Llama-3.1-Tulu-3-8B?utm_source=openai)))

HumanEval (pass@10): 83.9 (reported for Tülu 3 8B variants) (Source: ([huggingface.co](https://huggingface.co/allenai/Llama-3.1-Tulu-3-8B?utm_source=openai)))

TruthfulQA (6-shot): 55.0 (Tülu 3 8B final) (Source: ([huggingface.co](https://huggingface.co/allenai/Llama-3.1-Tulu-3-8B?utm_source=openai)))

Open LLM Leaderboard — Average (Open-LLM aggregated): 25.88 (Open LLM Leaderboard snapshot shown on model page) (Source: ([huggingface.co](https://huggingface.co/allenai/Llama-3.1-Tulu-3-8B?utm_source=openai)))

Downloads (recent month) — Hugging Face: 2,154 downloads last month (as reported on model page) (Source: ([huggingface.co](https://huggingface.co/allenai/Llama-3.1-Tulu-3-8B?utm_source=openai)))

Last Refreshed: 2026-03-03

Key Information

  • Category: Language Models
  • Type: AI Language Models Tool