openai

gpt-4o-2024-11-20

Context

128K

Input

text, image, audio, video

Output

text, image

Tool calling

Supported

About this model

GPT-4o (Omnimodal) is OpenAI's 1.2 trillion parameter multimodal foundation model featuring unified input processing across text, vision, and audio. Optimized for real-time interaction with enhanced reasoning and cross-modal understanding capabilities.

Capabilities

text

Available through the unified API

image

Available through the unified API

audio

Available through the unified API

video

Available through the unified API

Quick start

View API docs
curl https://api.oneinfer.ai/v1/chat/completions \
  -H "Authorization: Bearer $ONEINFER_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "ffcedefe6617464da928a4e6ec9c27e0",
    "messages": [
      { "role": "user", "content": "Hello!" }
    ]
  }'

Providers

Available routing options for this model through OneInfer.

openai

ffcedefe6617464da928a4e6ec9c27e0

Available

Input

$2.500 / 1M

Output

$10.000 / 1M

Routing

OneInfer optimized

Pricing

Current OneInfer pricing for this model.

UsagePrice
Input tokens$2.500 / 1M
Output tokens$10.000 / 1M

Performance

Published evaluation results associated with this model.

Multimodal Understanding

MMMU78.4
VQAv284.7
AudioCaps82.1

Reasoning Performance

GPQA46.3
ARC-Challenge88.9
TheoremQA51.6

Efficiency & Latency

Time-to-First-Token220
Audio Latency (ms)320
Tokens/sec120

Cross-Modal Alignment

Image-Text Retrieval92.4
Audio-Text Consistency89.7
Video-Action Matching85.3

API example

curl https://api.oneinfer.ai/v1/chat/completions \
  -H "Authorization: Bearer $ONEINFER_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "ffcedefe6617464da928a4e6ec9c27e0",
    "messages": [
      { "role": "user", "content": "Hello!" }
    ]
  }'