Inference API
OneInfer exposes a single, consistent HTTP API for text, image, video, and audio generation across 15+ providers and 100+ models. Every endpoint uses the same authentication, the same request envelope, and the same response shape.
Chat Completions
Send a conversation history to any supported LLM and receive a response — streamed token-by-token or as a single batched reply. The request format is compatible with the OpenAI Chat Completions shape so existing clients work with minimal changes.
Providers
OpenAI, Anthropic, Google, Groq, DeepSeek, Mistral, Novita, Fireworks, and more.
Streaming
Set stream: true to receive token-by-token Server-Sent Events instead of waiting for the full response.
System prompts
Pass a system role message to set the model's persona, task constraints, or output format.
Context length
Models range from 8k to 1M+ token context windows. Use GET Models to check limits per model.
curl -X POST "https://api.oneinfer.ai/v1/ula/chat/completions" \
-H "Authorization: Bearer YOUR_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"provider": "anthropic",
"model": "claude-sonnet-4-6",
"messages": [
{ "role": "system", "content": "You are a helpful assistant." },
{ "role": "user", "content": "Explain transformer attention in two sentences." }
],
"max_tokens": 256,
"stream": false
}'provider and model fields. No other code changes are needed — the response shape stays identical.Image Generation
Generate images from a text prompt using diffusion models from multiple providers. Control resolution, quality, number of images, and optionally pass a negative prompt to suppress unwanted elements.
Models
DALL-E 3, Stable Diffusion XL, Flux, and other leading diffusion models.
Batch generation
Request multiple images in one call with the n parameter.
Negative prompts
Pass negative_prompt to steer the model away from specific subjects or styles.
Sizes
Control output dimensions with the size field (e.g. 1024x1024, 1792x1024).
curl -X POST "https://api.oneinfer.ai/v1/ula/generate-image" \
-H "Authorization: Bearer YOUR_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"provider": "openai",
"model": "dall-e-3",
"prompt": "A futuristic city skyline at dusk, photorealistic",
"n": 1,
"size": "1024x1024",
"quality": "standard"
}'Video Generation
Create short AI-generated videos from a text prompt or a reference image. Specify resolution, aspect ratio, duration, frames-per-second, and whether to include synthesised audio. Video jobs are asynchronous — the response returns a job ID that you poll until the output URL is ready.
Text-to-video
Generate a video from a text description alone.
Image-to-video
Animate a reference image by providing an image URL alongside the prompt.
Audio synthesis
Set generate_audio: true to add AI-synthesised sound to the output.
curl -X POST "https://api.oneinfer.ai/v1/ula/generate-video" \
-H "Authorization: Bearer YOUR_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"provider": "novita",
"model": "seedance-v1.5-pro-t2v",
"prompt": "A red fox running through a snowy forest",
"resolution": "720P",
"aspect_ratio": "16:9",
"duration": 5,
"fps": 24,
"generate_audio": true
}'Audio Generation
Convert text to lifelike speech using the dedicated audio generation endpoint. Both MiniMax and Sarvam are supported for TTS, including Sarvam modelsbulbul:v2and bulbul:v3. Use stream: true for chunked audio bytes; use stream: false for JSON metadata with the audios array schema.
Voices
Fetch voice_id values dynamically from /v1/ula/get-supported-voice-for-audio-models for Sarvam models.
Output format
Non-streaming supports mp3, wav, and flac. Streaming is mp3-only per MiniMax HTTP T2A.
Speech controls
Adjust speed, volume, and pitch directly in the request body.
Response shape
Use stream: false for a JSON response with audios: [{url, format, base64_data, mime_type}]. Use stream: true for a live audio byte stream.
curl -X POST "https://api.oneinfer.ai/v1/ula/generate-audio" -H "Authorization: Bearer YOUR_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"provider": "sarvam",
"model": "bulbul:v3",
"prompt": "OneInfer makes AI inference simple.",
"stream": false,
"voice_id": "shubh",
"format": "mp3",
"speed": 1.0,
"volume": 1.0,
"pitch": 0
}'const response = await fetch("https://api.oneinfer.ai/v1/ula/generate-audio", {
method: "POST",
headers: {
"Authorization": "Bearer YOUR_TOKEN",
"Content-Type": "application/json",
},
body: JSON.stringify({
provider: "sarvam",
model: "bulbul:v2",
prompt: "Streaming audio from Sarvam over OneInfer.",
stream: true,
format: "mp3",
voice_id: "anushka"
})
});
const reader = response.body?.getReader();
while (reader) {
const { done, value } = await reader.read();
if (done) break;
console.log("received audio bytes", value?.length || 0);
}Streaming
All chat completion endpoints support Server-Sent Events (SSE) streaming. Set stream: true and read the response body as a stream of data: events, each containing a partial token. The stream closes with a final data: [DONE] event. Audio generation is different: /v1/ula/generate-audioreturns chunked audio/mpeg bytes whenstream: true.
import requests
response = requests.post(
"https://api.oneinfer.ai/v1/ula/chat/completions",
headers={"Authorization": "Bearer YOUR_TOKEN"},
json={
"provider": "openai",
"model": "gpt-4o-mini",
"messages": [{"role": "user", "content": "Tell me a story."}],
"stream": True,
},
stream=True,
)
for line in response.iter_lines():
if line and line != b"data: [DONE]":
print(line.decode())Next Steps
- See full request/response schemas in the API Reference.
- Browse available models via GET Models.
- Try examples in the Guides tab.