OpenAI GPT OSS - AI Language Models Tool

Overview

OpenAI GPT OSS is an open-weight family of language models released August 5, 2025, offering two MoE (mixture-of-experts) variants: gpt-oss-120b (≈117B total parameters, 5.1B active params/token) and gpt-oss-20b (≈21B total parameters, 3.6B active params/token). The models were designed for chain-of-thought reasoning, tool use (browser and Python container tools), and long-context workflows (native 128k context). OpenAI tuned the models for efficient inference using MXFP4 4-bit quantization on MoE weights so the 120B can be hosted on a single 80GB class GPU (e.g., NVIDIA H100 / AMD MI300X) while the 20B variant can run within ~16GB memory for local/edge deployments. ([openai.com](https://openai.com/index/introducing-gpt-oss/)) The release is fully open under the Apache-2.0 license and includes reference implementations, a Harmony chat format, and integration examples for Transformers, vLLM, llama.cpp, and inference providers (Hugging Face, Azure, AWS). Community adoption and conversions were rapid after launch (millions of downloads and many community-derived builds), and the repository and model pages provide guides for local inference, quantized conversions, and fine-tuning workflows. For developers, GPT OSS is positioned as a pragmatic, audit-friendly option for research, on-premises deployment, and product development where control over model weights, fine-tuning, and inspection of chain-of-thought is required. ([github.com](https://github.com/openai/gpt-oss))

Key Features

  • Two MoE variants: gpt-oss-120b (117B) and gpt-oss-20b (21B) for different deployment targets.
  • MXFP4 4-bit quantization applied to MoE weights for smaller memory footprint and faster inference.
  • Native 128k-token context for long documents, codebases, and multi-step reasoning.
  • Configurable reasoning effort (low / medium / high) to trade latency for performance.
  • Built-in agent/tool support: browser and Python-execution tools included as reference implementations.
  • Permissive Apache-2.0 license enabling commercial use, redistribution, and local deployment.

Example Usage

Example (python):

from transformers import pipeline

# Example: run a chat-style generation with the Hugging Face Transformers pipeline.
# (See OpenAI/gpt-oss README for Harmony format details and recommended clients.)
model_id = "openai/gpt-oss-120b"
pipe = pipeline(
    "text-generation",
    model=model_id,
    torch_dtype="auto",
    device_map="auto",
)

messages = [
    {"role": "user", "content": "Explain the concept of Fourier transforms in simple terms."},
]

outputs = pipe(messages, max_new_tokens=256)
print(outputs[0]["generated_text"][-1])

# Notes: production usage should follow the Harmony chat format and may prefer vLLM/Responses API for performance.
# Reference: https://github.com/openai/gpt-oss

Pricing

Free, open-weight release licensed under Apache 2.0 — model weights and reference code are downloadable without purchase (host/infra costs still apply). See OpenAI and Hugging Face model pages for distribution details.

Benchmarks

gpt-oss-120b — total parameters: ≈117 billion (Source: https://openai.com/index/introducing-gpt-oss/)

gpt-oss-120b — active parameters per token: ≈5.1 billion (Source: https://openai.com/index/introducing-gpt-oss/)

gpt-oss-20b — total parameters: ≈21 billion (Source: https://openai.com/index/introducing-gpt-oss/)

Context length (both models): 128k tokens (Source: https://openai.com/index/introducing-gpt-oss/)

License and distribution: Apache 2.0 (open weights, downloadable on Hugging Face / GitHub) (Source: https://github.com/openai/gpt-oss)

Last Refreshed: 2026-02-24

Key Information

  • Category: Language Models
  • Type: AI Language Models Tool