Inference API
OneInfer exposes a single, consistent HTTP API for text, image, video, and audio generation across 15+ providers and 100+ models. Every endpoint uses the same authentication, the same request envelope, and the same response shape.
Chat Completions
Send a conversation history to any supported LLM and receive a response — streamed token-by-token or as a single batched reply. The request format is compatible with the OpenAI Chat Completions shape so existing clients work with minimal changes.
Providers
OpenAI, Anthropic, Google, Groq, DeepSeek, Mistral, Novita, Fireworks, and more.
Streaming
Set stream: true to receive token-by-token Server-Sent Events instead of waiting for the full response.
System prompts
Pass a system role message to set the model's persona, task constraints, or output format.
Context length
Models range from 8k to 1M+ token context windows. Use GET Models to check limits per model.
curl -X POST "https://api.oneinfer.ai/v1/ula/chat/completions" \
-H "Authorization: Bearer YOUR_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"provider": "anthropic",
"model": "claude-sonnet-4-6",
"messages": [
{ "role": "system", "content": "You are a helpful assistant." },
{ "role": "user", "content": "Explain transformer attention in two sentences." }
],
"max_tokens": 256,
"stream": false
}'provider and model fields. No other code changes are needed — the response shape stays identical.Image Generation
Generate images from a text prompt using diffusion models from multiple providers. Control resolution, quality, number of images, and optionally pass a negative prompt to suppress unwanted elements.
Models
DALL-E 3, Stable Diffusion XL, Flux, and other leading diffusion models.
Batch generation
Request multiple images in one call with the n parameter.
Negative prompts
Pass negative_prompt to steer the model away from specific subjects or styles.
Sizes
Control output dimensions with the size field (e.g. 1024x1024, 1792x1024).
curl -X POST "https://api.oneinfer.ai/v1/ula/generate-image" \
-H "Authorization: Bearer YOUR_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"provider": "openai",
"model": "dall-e-3",
"prompt": "A futuristic city skyline at dusk, photorealistic",
"n": 1,
"size": "1024x1024",
"quality": "standard"
}'Video Generation
Create short AI-generated videos from a text prompt or a reference image. Specify resolution, aspect ratio, duration, frames-per-second, and whether to include synthesised audio. Video jobs are asynchronous — the response returns a job ID that you poll until the output URL is ready.
Text-to-video
Generate a video from a text description alone.
Image-to-video
Animate a reference image by providing an image URL alongside the prompt.
Audio synthesis
Set generate_audio: true to add AI-synthesised sound to the output.
curl -X POST "https://api.oneinfer.ai/v1/ula/generate-video" \
-H "Authorization: Bearer YOUR_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"provider": "novita",
"model": "seedance-v1.5-pro-t2v",
"prompt": "A red fox running through a snowy forest",
"resolution": "720P",
"aspect_ratio": "16:9",
"duration": 5,
"fps": 24,
"generate_audio": true
}'Audio Generation
Convert text to lifelike speech using state-of-the-art TTS models from OpenAI, Groq, and other providers. Control voice, speed, and output format. The response returns an audio file URL or a base64-encoded binary.
Voices
Choose from multiple voice presets per provider — alloy, echo, nova, shimmer, and more.
Output formats
mp3, opus, aac, and flac depending on the provider.
Speed control
Adjust speech rate with the speed parameter (0.25× – 4×).
Providers
OpenAI TTS-1 / TTS-1-HD, Groq PlayAI TTS, and others.
curl -X POST "https://api.oneinfer.ai/v1/ula/chat/completions" \
-H "Authorization: Bearer YOUR_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"provider": "openai",
"model": "tts-1",
"input": "OneInfer makes AI inference simple.",
"voice": "nova",
"response_format": "mp3",
"speed": 1.0
}'Streaming
All chat completion endpoints support Server-Sent Events (SSE) streaming. Set stream: true and read the response body as a stream of data: events, each containing a partial token. The stream closes with a final data: [DONE] event.
import requests
response = requests.post(
"https://api.oneinfer.ai/v1/ula/chat/completions",
headers={"Authorization": "Bearer YOUR_TOKEN"},
json={
"provider": "openai",
"model": "gpt-4o-mini",
"messages": [{"role": "user", "content": "Tell me a story."}],
"stream": True,
},
stream=True,
)
for line in response.iter_lines():
if line and line != b"data: [DONE]":
print(line.decode())Next Steps
- See full request/response schemas in the API Reference.
- Browse available models via GET Models.
- Try examples in the Guides tab.