OpenAI GPT-4o API - AI Language Models Tool

Overview

GPT-4o ("o" for omni) is OpenAI’s flagship multimodal model family designed to handle text, images, and audio in a single unified architecture. It was positioned for low-latency, conversational experiences (including streaming and function-calling) and targeted real-time applications such as voice assistants, multimodal document Q&A, and interactive code generation. According to OpenAI’s system card and public docs, GPT-4o was trained on a mixture of web, code, and multimodal data and includes product-level safety and moderation measures. (See OpenAI system card and model release notes.) In practice GPT-4o delivered competitive academic-benchmark performance and emphasized real-time responsiveness: developer-facing notes and third-party model listings report end-to-end audio latencies in the ~232–320 ms range and an API-context capability advertised up to 1,000,000 tokens (limited release), while ChatGPT clients used a smaller 128k token context. The model was actively used in ChatGPT and the API, and OpenAI announced a retirement of GPT-4o from the ChatGPT consumer interface effective February 13, 2026; the model remains (or remained) accessible to developers via the API at the time of that announcement. Key sources include OpenAI’s system card, the OpenAI model retirement announcement, and the model README on Replicate.

Key Features

  • Unified multimodal input/output: text, images, and audio in a single model.
  • Real-time audio responsiveness with reported end-to-end latency near 232 ms.
  • Very large API context (advertised up to 1,000,000 tokens in limited release).
  • High coding accuracy — reported strong HumanEval performance for Python tasks.
  • Supports streaming, function calling, tool use, and vision APIs for integrations.

Example Usage

Example (python):

from openai import OpenAI

# Basic Responses API example (official OpenAI Python SDK)
# Requires OPENAI_API_KEY in environment
client = OpenAI()

response = client.responses.create(
    model="gpt-4o",
    input="Summarize the following meeting notes and list three action items:\n- Discussed product roadmap\n- Need to fix bug in payments\n- Plan Q2 marketing",
    max_tokens=256
)

# Print the model's text output
print(response.output_text)

Pricing

OpenAI public pricing lists legacy ChatGPT/GPT model entries; a legacy entry for chatgpt-4o-latest appears in OpenAI pricing tables at approximately $5.00 per 1M input tokens and $15.00 per 1M output tokens (see OpenAI API pricing pages). Realtime and audio transcript/tts pricing are shown separately on OpenAI’s pricing pages; consult OpenAI’s API pricing docs for exact, up-to-date rates and available model variants.

Benchmarks

MMLU (language understanding): 87.2% (Source: https://replicate.com/openai/gpt-4o)

HumanEval (Python coding): 90.2% (Source: https://replicate.com/openai/gpt-4o)

GSM8K (math word problems): 94.4% (Source: https://replicate.com/openai/gpt-4o)

End-to-end audio latency (reported): ~232–320 ms (Source: https://replicate.com/openai/gpt-4o)

API context window (advertised, limited release): 1,000,000 tokens (Source: https://replicate.com/openai/gpt-4o)

Last Refreshed: 2026-02-24

Key Information

  • Category: Language Models
  • Type: AI Language Models Tool