Microsoft Phi-4 - AI Language Models Tool
Overview
Microsoft Phi-4 is a 14‑billion‑parameter, dense decoder‑only transformer in Microsoft’s Phi family designed for high-quality reasoning and instruction following. It was trained with a deliberate blend of high‑quality organic data (filtered public domain text, licensed books, code) and large amounts of synthetic, textbook‑style data produced via multi‑stage prompting, self‑revision, and other workflows to improve step‑by‑step reasoning and math performance. The model underwent supervised fine‑tuning and iterative Direct Preference Optimization (DPO) to improve instruction adherence and safety behaviors. ([huggingface.co](https://huggingface.co/microsoft/phi-4)) Phi-4 targets research and production use-cases that need strong reasoning in a relatively small, latency- and compute‑efficient footprint. It supports chat‑style prompts, a long 16K token context window, and was released to the public (Hugging Face/Azure Foundry) in December 2024. Microsoft emphasizes post‑training safety evaluation and red‑teaming during release; the technical report documents training scale (≈9.8T tokens), compute (H100 fleets), and evaluation on standard academic benchmarks where Phi‑4 shows especially strong math and reasoning scores. ([huggingface.co](https://huggingface.co/microsoft/phi-4))
Model Statistics
- Downloads: 65
- Likes: 55
- Parameters: 14.7B
License: other
Model Details
Architecture and size: Phi‑4 is described in Microsoft’s technical report as a 14B‑parameter, dense decoder‑only transformer. The model is optimized for chat‑style generation and stepwise reasoning rather than multimodal tasks (though the Phi family includes multimodal variants). Context and compute: the model supports a 16K token context window (chat format) and was trained on roughly 9.8 trillion tokens using large H100 clusters (training reported as ~21 days on H100 hardware). Training recipe: the team increased the proportion of synthetic, “textbook‑like” tokens in pretraining and applied careful curation and decontamination of organic data. Post‑training alignment: Phi‑4 uses supervised fine‑tuning followed by iterative Direct Preference Optimization (DPO) and rejection sampling techniques to refine outputs and improve safety. Benchmarks & capabilities: the technical report and Hugging Face model card present Phi‑4 as state‑of‑the‑art for its size on many reasoning benchmarks (see benchmarks section). Deployment and license: Microsoft released Phi‑4 under a permissive license and publishes model files on Hugging Face and through Azure AI Foundry for both research and production deployments. ([ar5iv.org](https://ar5iv.org/pdf/2412.08905))
Key Features
- 14B‑parameter dense decoder‑only transformer trained for high‑quality reasoning.
- 16K token context window suitable for long chat and multi‑step problems.
- Training mix emphasizes synthetic "textbook‑like" data to teach stepwise reasoning.
- Post‑training SFT + iterative DPO to improve instruction following and safety.
- Strong math and reasoning benchmark results relative to its size.
- Designed for latency‑ and compute‑constrained deployments and research use.
- Chat‑format input template and guidance recommended for best output.
- Openly released model files on Hugging Face and available via Azure Foundry.
Example Usage
Example (python):
import transformers
# Example: use the Hugging Face transformers pipeline to run phi-4 (chat format)
# Note: ensure you have sufficient resources and the correct transformers version.
pipeline = transformers.pipeline(
"text-generation",
model="microsoft/phi-4",
model_kwargs={"torch_dtype": "auto"},
device_map="auto",
)
messages = [
{"role": "system", "content": "You are a medieval knight and must provide explanations to modern people."},
{"role": "user", "content": "How should I explain the Internet?"},
]
outputs = pipeline(messages, max_new_tokens=128)
print(outputs[0]["generated_text"][-1])
# Source: Hugging Face model card / Phi‑4 technical report for usage guidelines. Benchmarks
MMLU (simple-evals): 84.8 (Source: https://huggingface.co/microsoft/phi-4 (Phi‑4 model card / technical report))
GPQA (graduate STEM Q&A): 56.1 (Source: https://huggingface.co/microsoft/phi-4 (Phi‑4 model card / technical report))
MATH (competition math benchmark): 80.4 (Source: https://ar5iv.labs.arxiv.org/html/2412.08905 (Phi‑4 Technical Report))
HumanEval (code generation): 82.6 (Source: https://huggingface.co/microsoft/phi-4 (Phi‑4 model card / technical report))
DROP (reasoning / reading comprehension): 75.5 (Source: https://huggingface.co/microsoft/phi-4 (Phi‑4 model card / technical report))
Key Information
- Category: Language Models
- Type: AI Language Models Tool