Audio

Text-to-Speech

Step-by-step guide to authenticate, call the audio generation API, and save the returned TTS audio in your app.

This guide uses the universal audio endpoint at /v1/ula/generate-audio. First exchange your API key for a Bearer token, then send a JSON request withprovider,model,prompt, and optional voice settings.

How the Flow Works

  1. Create or copy your OneInfer API key.
  2. Call /v1/ula/oauth-authentication to get a Bearer token.
  3. POST JSON to /v1/ula/generate-audio with your TTS provider, model, text prompt, and voice options.
  4. Read the generated audio from data.audios[0] in the response.
  5. If the API returns base64 audio, decode and save it. If it returns a URL, fetch or play that URL directly.

Required Request Shape

json
{
  "provider": "sarvam",
  "model": "bulbul:v3",
  "prompt": "Welcome to OneInfer, your unified AI inference platform.",
  "voice_id": "shubh",
  "format": "mp3",
  "stream": false
}

Common fields:provider selects the TTS backend,model selects the voice model,prompt is the text to synthesize, and voice_id chooses a speaker when the model supports it.

Python End-to-End Example

python
import requests
import base64

BASE_URL = "https://api.oneinfer.ai"
API_KEY = "YOUR_API_KEY"

# Step 1: exchange API key for Bearer token
auth_response = requests.post(
    f"{BASE_URL}/v1/ula/oauth-authentication",
    params={"api_key": API_KEY},
    timeout=30,
)
auth_response.raise_for_status()
token = auth_response.json()["access_token"]

headers = {
    "Authorization": f"Bearer {token}",
    "Content-Type": "application/json",
}

# Step 2: send the TTS request
response = requests.post(
    f"{BASE_URL}/v1/ula/generate-audio",
    headers=headers,
    json={
        "provider": "sarvam",
        "model": "bulbul:v3",
        "prompt": "Welcome to OneInfer. This audio was generated through the unified API.",
        "voice_id": "shubh",
        "format": "mp3",
        "stream": False,
    },
    timeout=120,
)
response.raise_for_status()

payload = response.json()
audio = payload["data"]["audios"][0]

# Step 3: save the returned audio
if audio.get("base64_data"):
    with open("oneinfer-tts.mp3", "wb") as f:
        f.write(base64.b64decode(audio["base64_data"]))
    print("Saved audio to oneinfer-tts.mp3")
elif audio.get("url"):
    audio_bytes = requests.get(audio["url"], timeout=60)
    audio_bytes.raise_for_status()
    with open("oneinfer-tts.mp3", "wb") as f:
        f.write(audio_bytes.content)
    print("Downloaded audio to oneinfer-tts.mp3")
else:
    raise ValueError("No audio data returned by API")

What the Response Looks Like

json
{
  "api_details": {
    "api_status": "success",
    "api_message": "API has returned response successfully."
  },
  "data": {
    "id": "aud_12345abcde",
    "provider": "sarvam",
    "model": "bulbul:v3",
    "audios": [
      {
        "url": "data:audio/mpeg;base64,...",
        "format": "mp3",
        "base64_data": "...",
        "mime_type": "audio/mpeg"
      }
    ]
  },
  "error": {}
}

In most clients, the value you care about is data.audios[0]. If base64_data is present, decode it and write it to a file. If url is present, use it directly in an audio player or download it.

Practical Tips

  • Use format: "mp3" unless you explicitly need another format.
  • Keep prompts reasonably short for better latency and easier playback handling.
  • If you need a valid voice_id, check Get Supported Voice for Audio Models first.
  • Use stream: true only when your client is ready to handle chunked audio bytes.
  • Always call response.raise_for_status() or check HTTP status codes before reading the body.