openai
gpt-4o-2024-11-20
Context
128K
Input
text, image, audio, video
Output
text, image
Tool calling
Supported
About this model
GPT-4o (Omnimodal) is OpenAI's 1.2 trillion parameter multimodal foundation model featuring unified input processing across text, vision, and audio. Optimized for real-time interaction with enhanced reasoning and cross-modal understanding capabilities.
Capabilities
text
Available through the unified API
image
Available through the unified API
audio
Available through the unified API
video
Available through the unified API
Quick start
View API docscurl https://api.oneinfer.ai/v1/chat/completions \
-H "Authorization: Bearer $ONEINFER_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "ffcedefe6617464da928a4e6ec9c27e0",
"messages": [
{ "role": "user", "content": "Hello!" }
]
}'Providers
Available routing options for this model through OneInfer.
openai
ffcedefe6617464da928a4e6ec9c27e0
Input
$2.500 / 1M
Output
$10.000 / 1M
Routing
OneInfer optimized
Pricing
Current OneInfer pricing for this model.
| Usage | Price |
|---|---|
| Input tokens | $2.500 / 1M |
| Output tokens | $10.000 / 1M |
Performance
Published evaluation results associated with this model.
Multimodal Understanding
MMMU78.4
VQAv284.7
AudioCaps82.1
Reasoning Performance
GPQA46.3
ARC-Challenge88.9
TheoremQA51.6
Efficiency & Latency
Time-to-First-Token220
Audio Latency (ms)320
Tokens/sec120
Cross-Modal Alignment
Image-Text Retrieval92.4
Audio-Text Consistency89.7
Video-Action Matching85.3
API example
curl https://api.oneinfer.ai/v1/chat/completions \
-H "Authorization: Bearer $ONEINFER_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "ffcedefe6617464da928a4e6ec9c27e0",
"messages": [
{ "role": "user", "content": "Hello!" }
]
}'