Inference API

OneInfer exposes a single, consistent HTTP API for text, image, video, and audio generation across 15+ providers and 100+ models. Every endpoint uses the same authentication, the same request envelope, and the same response shape.

Chat Completions

Send a conversation history to any supported LLM and receive a response — streamed token-by-token or as a single batched reply. The request format is compatible with the OpenAI Chat Completions shape so existing clients work with minimal changes.

Providers

OpenAI, Anthropic, Google, Groq, DeepSeek, Mistral, Novita, Fireworks, and more.

Streaming

Set stream: true to receive token-by-token Server-Sent Events instead of waiting for the full response.

System prompts

Pass a system role message to set the model's persona, task constraints, or output format.

Context length

Models range from 8k to 1M+ token context windows. Use GET Models to check limits per model.

curl -X POST "https://api.oneinfer.ai/v1/ula/chat/completions" \
  -H "Authorization: Bearer YOUR_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "provider": "anthropic",
    "model": "claude-sonnet-4-6",
    "messages": [
      { "role": "system", "content": "You are a helpful assistant." },
      { "role": "user", "content": "Explain transformer attention in two sentences." }
    ],
    "max_tokens": 256,
    "stream": false
  }'
To switch providers, change the provider and model fields. No other code changes are needed — the response shape stays identical.

Image Generation

Generate images from a text prompt using diffusion models from multiple providers. Control resolution, quality, number of images, and optionally pass a negative prompt to suppress unwanted elements.

Models

DALL-E 3, Stable Diffusion XL, Flux, and other leading diffusion models.

Batch generation

Request multiple images in one call with the n parameter.

Negative prompts

Pass negative_prompt to steer the model away from specific subjects or styles.

Sizes

Control output dimensions with the size field (e.g. 1024x1024, 1792x1024).

curl -X POST "https://api.oneinfer.ai/v1/ula/generate-image" \
  -H "Authorization: Bearer YOUR_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "provider": "openai",
    "model": "dall-e-3",
    "prompt": "A futuristic city skyline at dusk, photorealistic",
    "n": 1,
    "size": "1024x1024",
    "quality": "standard"
  }'

Video Generation

Create short AI-generated videos from a text prompt or a reference image. Specify resolution, aspect ratio, duration, frames-per-second, and whether to include synthesised audio. Video jobs are asynchronous — the response returns a job ID that you poll until the output URL is ready.

Text-to-video

Generate a video from a text description alone.

Image-to-video

Animate a reference image by providing an image URL alongside the prompt.

Audio synthesis

Set generate_audio: true to add AI-synthesised sound to the output.

curl -X POST "https://api.oneinfer.ai/v1/ula/generate-video" \
  -H "Authorization: Bearer YOUR_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "provider": "novita",
    "model": "seedance-v1.5-pro-t2v",
    "prompt": "A red fox running through a snowy forest",
    "resolution": "720P",
    "aspect_ratio": "16:9",
    "duration": 5,
    "fps": 24,
    "generate_audio": true
  }'

Audio Generation

Convert text to lifelike speech using the dedicated audio generation endpoint. Both MiniMax and Sarvam are supported for TTS, including Sarvam modelsbulbul:v2and bulbul:v3. Use stream: true for chunked audio bytes; use stream: false for JSON metadata with the audios array schema.

Voices

Fetch voice_id values dynamically from /v1/ula/get-supported-voice-for-audio-models for Sarvam models.

Output format

Non-streaming supports mp3, wav, and flac. Streaming is mp3-only per MiniMax HTTP T2A.

Speech controls

Adjust speed, volume, and pitch directly in the request body.

Response shape

Use stream: false for a JSON response with audios: [{url, format, base64_data, mime_type}]. Use stream: true for a live audio byte stream.

curl -X POST "https://api.oneinfer.ai/v1/ula/generate-audio"   -H "Authorization: Bearer YOUR_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
        "provider": "sarvam",
        "model": "bulbul:v3",
    "prompt": "OneInfer makes AI inference simple.",
        "stream": false,
        "voice_id": "shubh",
    "format": "mp3",
    "speed": 1.0,
    "volume": 1.0,
    "pitch": 0
  }'
const response = await fetch("https://api.oneinfer.ai/v1/ula/generate-audio", {
    method: "POST",
    headers: {
        "Authorization": "Bearer YOUR_TOKEN",
        "Content-Type": "application/json",
    },
    body: JSON.stringify({
        provider: "sarvam",
        model: "bulbul:v2",
        prompt: "Streaming audio from Sarvam over OneInfer.",
        stream: true,
        format: "mp3",
        voice_id: "anushka"
    })
});

const reader = response.body?.getReader();
while (reader) {
    const { done, value } = await reader.read();
    if (done) break;
    console.log("received audio bytes", value?.length || 0);
}

Streaming

All chat completion endpoints support Server-Sent Events (SSE) streaming. Set stream: true and read the response body as a stream of data: events, each containing a partial token. The stream closes with a final data: [DONE] event. Audio generation is different: /v1/ula/generate-audioreturns chunked audio/mpeg bytes whenstream: true.

import requests

response = requests.post(
    "https://api.oneinfer.ai/v1/ula/chat/completions",
    headers={"Authorization": "Bearer YOUR_TOKEN"},
    json={
        "provider": "openai",
        "model": "gpt-4o-mini",
        "messages": [{"role": "user", "content": "Tell me a story."}],
        "stream": True,
    },
    stream=True,
)

for line in response.iter_lines():
    if line and line != b"data: [DONE]":
        print(line.decode())

Next Steps