Llama 4 Maverick & Scout - AI Language Models Tool
Overview
Llama 4 Maverick and Scout are Meta’s next-generation Mixture-of-Experts (MoE) language models, published on April 5, 2025 and hosted on the Hugging Face Hub. Both route to 17B active parameters per forward pass while exposing very different total capacities: Scout ≈109B total with 16 experts, and Maverick ≈400B total with 128 experts. They were trained on large multilingual corpora (reported ~22–40 trillion tokens) and support native multimodal inputs (text + images) via early-fusion designs. (See the Hugging Face release post and model cards for details: https://huggingface.co/blog/llama4-release; https://huggingface.co/meta-llama/Llama-4-Scout-17B-16E.) Notable engineering choices target extreme context and deployability: Scout’s instruction-tuned variant supports a 10 million-token context window while Maverick’s instruct variant supports up to 1 million tokens; both were pre-trained with shorter (256K) context and use an interleaved NoPE/iRoPE + chunked-attention design to scale long context efficiently (Hugging Face blog). The models integrate with Hugging Face Transformers (v4.51.0+) and Text Generation Inference (TGI) for optimized serving and offer quantization paths (Scout: on-the-fly 4-bit/8-bit; Maverick: BF16/FP8 formats). Community response has been mixed — reviewers report strong benchmark numbers on many tasks but also raised concerns about benchmark transparency and mixed real-world behavior; multiple news and forum threads discuss these reactions (e.g., The Verge and Hugging Face community posts).
Key Features
- Mixture-of-Experts (MoE): 17B active parameters per pass with Scout 16 experts and Maverick 128 experts.
- Native multimodality (early fusion): process text and images together for tasks like ChartQA and DocVQA.
- Extreme context windows: Scout instruction-tuned to 10M tokens; Maverick instruction-tuned to 1M tokens.
- Transformer & TGI integration: official support in transformers (v4.51.0+) and Text Generation Inference for scalable serving.
- Quantization and deployment options: Scout supports on-the-fly 4-bit/8-bit quant; Maverick offers BF16/FP8 weights.
- Large multilingual training: trained on reported trillions of tokens across ~200 languages (model cards report ~22–40T).
- Community license & governance: released under a Llama 4 Community License with usage restrictions and attribution requirements.
Example Usage
Example (python):
from transformers import AutoTokenizer, AutoModelForCausalLM
# Requires transformers >= 4.51.0 and access to the model (accept license on Hugging Face)
model_id = 'meta-llama/Llama-4-Scout-17B-16E'
# load tokenizer and model (trust_remote_code=True may be required for Meta's custom code)
tokenizer = AutoTokenizer.from_pretrained(model_id, use_fast=False)
model = AutoModelForCausalLM.from_pretrained(
model_id,
trust_remote_code=True,
device_map='auto' # let transformers map layers to available devices
)
# simple generation
inputs = tokenizer('Summarize the following document:\n\n[PASTE LONG DOCUMENT HERE]', return_tensors='pt').to(model.device)
outputs = model.generate(**inputs, max_new_tokens=256)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
# Notes: for production-scale serving, use Hugging Face Text Generation Inference (TGI) or vLLM/TGI backends.
# See Hugging Face Llama 4 release notes for deployment recommendations: https://huggingface.co/blog/llama4-release Benchmarks
MMLU (5-shot) — Llama 4 Maverick: 85.5% (Source: https://huggingface.co/blog/llama4-release)
MMLU (5-shot) — Llama 4 Scout: 79.6% (Source: https://huggingface.co/blog/llama4-release)
MBPP pass@1 (code) — Llama 4 Maverick: 77.6% (Source: https://huggingface.co/blog/llama4-release)
MBPP pass@1 (code) — Llama 4 Scout: 67.8% (Source: https://huggingface.co/blog/llama4-release)
MATH (math reasoning) — Llama 4 Maverick: 61.2% (Source: https://huggingface.co/blog/llama4-release)
MATH (math reasoning) — Llama 4 Scout: 50.3% (Source: https://huggingface.co/blog/llama4-release)
ChartQA (image+chart reasoning) — Llama 4 Maverick: 85.3% (Source: https://huggingface.co/blog/llama4-release)
ChartQA (image+chart reasoning) — Llama 4 Scout: 83.4% (Source: https://huggingface.co/blog/llama4-release)
Instruction-tuned context window — Llama 4 Scout: 10,000,000 tokens (Source: https://huggingface.co/blog/llama4-release)
Instruction-tuned context window — Llama 4 Maverick: 1,000,000 tokens (Source: https://huggingface.co/blog/llama4-release)
Key Information
- Category: Language Models
- Type: AI Language Models Tool