OpenAI GPT-4o API - AI Language Models Tool

Overview

GPT-4o (the “o” for “omni”) is OpenAI’s flagship multimodal model family that unifies text, images, and audio into a single API-first model. It’s designed for low-latency, real-time experiences (including conversational voice) while retaining high capability for reasoning, math, and code generation — making it a practical choice for real-time voice assistants, multimodal document Q&A, and advanced coding tools. ([replicate.com](https://replicate.com/openai/gpt-4o?utm_source=openai)) GPT-4o supports image and text inputs via the API and offers audio input/output in ChatGPT, with reported end-to-end audio latencies in demonstrations as low as ~232 ms. The model is offered in multiple snapshots and endpoints (chat, responses, realtime, batch), and OpenAI documents a very large context offering in some deployments: ChatGPT deployments list a 128k-token window, while OpenAI has indicated a 1,000,000-token context capability via the API in limited release. These modal and context differences are documented across OpenAI’s platform pages and third-party summaries. ([replicate.com](https://replicate.com/openai/gpt-4o?utm_source=openai)) Availability, pricing tiers, and snapshot names have been updated multiple times since release; OpenAI publishes per-model pricing and endpoint details on its pricing and models documentation pages. Reviewers and safety assessments have noted both strong capability and noteworthy safety considerations during early red-teaming. ([platform.openai.com](https://platform.openai.com/pricing?utm_source=openai))

Key Features

  • Unified multimodal input: text and images via API, audio in ChatGPT and select endpoints.
  • Low-latency real-time conversational capability (demo audio latencies reported ~232 ms).
  • Very large context: 1,000,000-token context available via API in limited release for long-form reasoning.
  • High performance on reasoning, math, and coding benchmarks (MMLU, GSM8K, HumanEval reported scores).
  • Supports streaming, function calling, tool use, and multiple API endpoints (chat, responses, realtime, batch).

Example Usage

Example (python):

import os
import requests

# Simple example calling the OpenAI Responses API with model "gpt-4o".
# Requires OPENAI_API_KEY set in environment.
API_KEY = os.getenv("OPENAI_API_KEY")
endpoint = "https://api.openai.com/v1/responses"
headers = {
    "Authorization": f"Bearer {API_KEY}",
    "Content-Type": "application/json",
}

payload = {
    "model": "gpt-4o",
    "input": [
        {"role": "user", "content": "Summarize this product page and list three follow-up questions."},
        # Example of including an image input by URL (many GPT-4o endpoints accept image inputs).
        {"role": "user", "content": {"type": "input_image", "image_url": "https://example.com/diagram.png"}}
    ],
    "max_output_tokens": 512
}

resp = requests.post(endpoint, headers=headers, json=payload)
resp.raise_for_status()
print(resp.json())

# Note: for streaming, realtime, or audio I/O you would use the realtime endpoint or SDK helpers.
# See OpenAI platform docs for SDK examples and streaming patterns.

Pricing

OpenAI publishes per-model token pricing; recent public listings show GPT-4o input at about $2.50 and output at about $10.00 per 1M text tokens, with separate rates for audio/image and realtime variants. Exact costs depend on the snapshot and endpoint used (e.g., realtime, mini variants, or audio-enabled previews); consult OpenAI’s pricing page for the latest per-model rates. ([platform.openai.com](https://platform.openai.com/pricing?utm_source=openai))

Benchmarks

MMLU (language understanding): 87.2% (Source: https://replicate.com/openai/gpt-4o)

HumanEval (Python coding pass@1 or comparable): 90.2% (Source: https://replicate.com/openai/gpt-4o)

GSM8K (math word problems): 94.4% (Source: https://replicate.com/openai/gpt-4o)

End-to-end audio latency (reported demo range): ~232–320 ms (Source: https://replicate.com/openai/gpt-4o)

Context window (API, limited release): 1,000,000 tokens (API, limited release) (Source: https://replicate.com/openai/gpt-4o)

Last Refreshed: 2026-01-17

Key Information

  • Category: Language Models
  • Type: AI Language Models Tool