Audio Generation API

Generate high-quality audio (Text-to-Speech) or transcribe audio files (Speech-to-Text) using the universal API. Our universal API supports state-of-the-art audio models for text-to-speech and transcription.

01Request Headers

Authorizationstringrequired

Bearer token for authentication. Format: Bearer <YOUR_TOKEN>. Exchange your API key for a token via the Authentication endpoint.

Content-Typestringrequired

Set to application/json for all audio generation requests.

02Request Parameters

providerstringrequired

Audio provider. Supported: 'minimax' and 'sarvam'. Use the GET Models endpoint to retrieve available provider and model details.

modelstringrequired

Audio model name. For Sarvam TTS use 'bulbul:v2' or 'bulbul:v3'.

promptstringrequired

Text to synthesize into speech.

streamboolean

If true, returns chunked audio bytes directly instead of JSON metadata.

voice_idstring

Voice identifier. For Sarvam models, fetch supported values from the Get Supported Voice for Audio Models endpoint.

formatstring

Output audio format (e.g. mp3, wav). For MiniMax streaming, mp3 is required.

03Example Request

Example TTS Request

{
    "provider": "sarvam",
    "model": "bulbul:v3",
    "prompt": "Namaste from OneInfer audio generation.",
    "stream": false,
    "voice_id": "shubh",
    "format": "mp3",
    "speed": 1.0,
    "volume": 1.0,
    "pitch": 0
}

04Response

{ "api_details": { "api_status": "success", "api_message": "API has returned response successfully." }, "data": { "id": "aud_12345abcde", "created": 1711468800, "text": "Generated audio for prompt: Namaste from OneInfer audio generation.", "finish_reason": "stop", "provider": "sarvam", "model": "bulbul:v3", "usage": { "prompt_tokens": 7, "completion_tokens": 0, "total_tokens": 7 }, "latency_ms": 980, "audios": [ { "url": "data:audio/mpeg;base64,....", "format": "mp3", "base64_data": "...", "mime_type": "audio/mpeg" } ] }, "error": {} }

Error Status Codes

Code	Status	Description
200	OK	Audio generated successfully.
400	Bad Request	Invalid request body or unsupported provider/model.
401	Unauthorized	Missing or invalid Authorization header / Bearer token.
403	Forbidden	Insufficient credit balance.
422	Unprocessable Entity	Request body failed schema validation.
500	Internal Server Error	Unexpected error during audio generation.