Inference API

OneInfer exposes a single, consistent HTTP API for text, image, video, and audio generation across 15+ providers and 100+ models. Every endpoint uses the same authentication, the same request envelope, and the same response shape.

Chat Completions

Send a conversation history to any supported LLM and receive a response — streamed token-by-token or as a single batched reply. The request format is compatible with the OpenAI Chat Completions shape so existing clients work with minimal changes.

Providers

OpenAI, Anthropic, Google, Groq, DeepSeek, Mistral, Novita, Fireworks, and more.

Streaming

Set stream: true to receive token-by-token Server-Sent Events instead of waiting for the full response.

System prompts

Pass a system role message to set the model's persona, task constraints, or output format.

Context length

Models range from 8k to 1M+ token context windows. Use GET Models to check limits per model.

curl -X POST "https://api.oneinfer.ai/v1/ula/chat/completions" \
  -H "Authorization: Bearer YOUR_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "provider": "anthropic",
    "model": "claude-sonnet-4-6",
    "messages": [
      { "role": "system", "content": "You are a helpful assistant." },
      { "role": "user", "content": "Explain transformer attention in two sentences." }
    ],
    "max_tokens": 256,
    "stream": false
  }'
To switch providers, change the provider and model fields. No other code changes are needed — the response shape stays identical.

Image Generation

Generate images from a text prompt using diffusion models from multiple providers. Control resolution, quality, number of images, and optionally pass a negative prompt to suppress unwanted elements.

Models

DALL-E 3, Stable Diffusion XL, Flux, and other leading diffusion models.

Batch generation

Request multiple images in one call with the n parameter.

Negative prompts

Pass negative_prompt to steer the model away from specific subjects or styles.

Sizes

Control output dimensions with the size field (e.g. 1024x1024, 1792x1024).

curl -X POST "https://api.oneinfer.ai/v1/ula/generate-image" \
  -H "Authorization: Bearer YOUR_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "provider": "openai",
    "model": "dall-e-3",
    "prompt": "A futuristic city skyline at dusk, photorealistic",
    "n": 1,
    "size": "1024x1024",
    "quality": "standard"
  }'

Video Generation

Create short AI-generated videos from a text prompt or a reference image. Specify resolution, aspect ratio, duration, frames-per-second, and whether to include synthesised audio. Video jobs are asynchronous — the response returns a job ID that you poll until the output URL is ready.

Text-to-video

Generate a video from a text description alone.

Image-to-video

Animate a reference image by providing an image URL alongside the prompt.

Audio synthesis

Set generate_audio: true to add AI-synthesised sound to the output.

curl -X POST "https://api.oneinfer.ai/v1/ula/generate-video" \
  -H "Authorization: Bearer YOUR_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "provider": "novita",
    "model": "seedance-v1.5-pro-t2v",
    "prompt": "A red fox running through a snowy forest",
    "resolution": "720P",
    "aspect_ratio": "16:9",
    "duration": 5,
    "fps": 24,
    "generate_audio": true
  }'

Audio Generation

Convert text to lifelike speech using state-of-the-art TTS models from OpenAI, Groq, and other providers. Control voice, speed, and output format. The response returns an audio file URL or a base64-encoded binary.

Voices

Choose from multiple voice presets per provider — alloy, echo, nova, shimmer, and more.

Output formats

mp3, opus, aac, and flac depending on the provider.

Speed control

Adjust speech rate with the speed parameter (0.25× – 4×).

Providers

OpenAI TTS-1 / TTS-1-HD, Groq PlayAI TTS, and others.

curl -X POST "https://api.oneinfer.ai/v1/ula/chat/completions" \
  -H "Authorization: Bearer YOUR_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "provider": "openai",
    "model": "tts-1",
    "input": "OneInfer makes AI inference simple.",
    "voice": "nova",
    "response_format": "mp3",
    "speed": 1.0
  }'

Streaming

All chat completion endpoints support Server-Sent Events (SSE) streaming. Set stream: true and read the response body as a stream of data: events, each containing a partial token. The stream closes with a final data: [DONE] event.

import requests

response = requests.post(
    "https://api.oneinfer.ai/v1/ula/chat/completions",
    headers={"Authorization": "Bearer YOUR_TOKEN"},
    json={
        "provider": "openai",
        "model": "gpt-4o-mini",
        "messages": [{"role": "user", "content": "Tell me a story."}],
        "stream": True,
    },
    stream=True,
)

for line in response.iter_lines():
    if line and line != b"data: [DONE]":
        print(line.decode())

Next Steps