OpenAI GPT-4o API - AI Language Models Tool
Overview
GPT-4o (the “o” for “omni”) is OpenAI’s flagship multimodal model family that unifies text, images, and audio into a single API-first model. It’s designed for low-latency, real-time experiences (including conversational voice) while retaining high capability for reasoning, math, and code generation — making it a practical choice for real-time voice assistants, multimodal document Q&A, and advanced coding tools. ([replicate.com](https://replicate.com/openai/gpt-4o?utm_source=openai)) GPT-4o supports image and text inputs via the API and offers audio input/output in ChatGPT, with reported end-to-end audio latencies in demonstrations as low as ~232 ms. The model is offered in multiple snapshots and endpoints (chat, responses, realtime, batch), and OpenAI documents a very large context offering in some deployments: ChatGPT deployments list a 128k-token window, while OpenAI has indicated a 1,000,000-token context capability via the API in limited release. These modal and context differences are documented across OpenAI’s platform pages and third-party summaries. ([replicate.com](https://replicate.com/openai/gpt-4o?utm_source=openai)) Availability, pricing tiers, and snapshot names have been updated multiple times since release; OpenAI publishes per-model pricing and endpoint details on its pricing and models documentation pages. Reviewers and safety assessments have noted both strong capability and noteworthy safety considerations during early red-teaming. ([platform.openai.com](https://platform.openai.com/pricing?utm_source=openai))
Key Features
- Unified multimodal input: text and images via API, audio in ChatGPT and select endpoints.
- Low-latency real-time conversational capability (demo audio latencies reported ~232 ms).
- Very large context: 1,000,000-token context available via API in limited release for long-form reasoning.
- High performance on reasoning, math, and coding benchmarks (MMLU, GSM8K, HumanEval reported scores).
- Supports streaming, function calling, tool use, and multiple API endpoints (chat, responses, realtime, batch).
Example Usage
Example (python):
import os
import requests
# Simple example calling the OpenAI Responses API with model "gpt-4o".
# Requires OPENAI_API_KEY set in environment.
API_KEY = os.getenv("OPENAI_API_KEY")
endpoint = "https://api.openai.com/v1/responses"
headers = {
"Authorization": f"Bearer {API_KEY}",
"Content-Type": "application/json",
}
payload = {
"model": "gpt-4o",
"input": [
{"role": "user", "content": "Summarize this product page and list three follow-up questions."},
# Example of including an image input by URL (many GPT-4o endpoints accept image inputs).
{"role": "user", "content": {"type": "input_image", "image_url": "https://example.com/diagram.png"}}
],
"max_output_tokens": 512
}
resp = requests.post(endpoint, headers=headers, json=payload)
resp.raise_for_status()
print(resp.json())
# Note: for streaming, realtime, or audio I/O you would use the realtime endpoint or SDK helpers.
# See OpenAI platform docs for SDK examples and streaming patterns. Pricing
OpenAI publishes per-model token pricing; recent public listings show GPT-4o input at about $2.50 and output at about $10.00 per 1M text tokens, with separate rates for audio/image and realtime variants. Exact costs depend on the snapshot and endpoint used (e.g., realtime, mini variants, or audio-enabled previews); consult OpenAI’s pricing page for the latest per-model rates. ([platform.openai.com](https://platform.openai.com/pricing?utm_source=openai))
Benchmarks
MMLU (language understanding): 87.2% (Source: https://replicate.com/openai/gpt-4o)
HumanEval (Python coding pass@1 or comparable): 90.2% (Source: https://replicate.com/openai/gpt-4o)
GSM8K (math word problems): 94.4% (Source: https://replicate.com/openai/gpt-4o)
End-to-end audio latency (reported demo range): ~232–320 ms (Source: https://replicate.com/openai/gpt-4o)
Context window (API, limited release): 1,000,000 tokens (API, limited release) (Source: https://replicate.com/openai/gpt-4o)
Key Information
- Category: Language Models
- Type: AI Language Models Tool