Audio
Text-to-Speech
Step-by-step guide to authenticate, call the audio generation API, and save the returned TTS audio in your app.
This guide uses the universal audio endpoint at
/v1/ula/generate-audio. First exchange your API key for a Bearer token, then send a JSON request withprovider,model,prompt, and optional voice settings.How the Flow Works
- Create or copy your OneInfer API key.
- Call
/v1/ula/oauth-authenticationto get a Bearer token. - POST JSON to
/v1/ula/generate-audiowith your TTS provider, model, text prompt, and voice options. - Read the generated audio from
data.audios[0]in the response. - If the API returns base64 audio, decode and save it. If it returns a URL, fetch or play that URL directly.
Required Request Shape
json
{
"provider": "sarvam",
"model": "bulbul:v3",
"prompt": "Welcome to OneInfer, your unified AI inference platform.",
"voice_id": "shubh",
"format": "mp3",
"stream": false
}Common fields:provider selects the TTS backend,model selects the voice model,prompt is the text to synthesize, and voice_id chooses a speaker when the model supports it.
Python End-to-End Example
python
import requests
import base64
BASE_URL = "https://api.oneinfer.ai"
API_KEY = "YOUR_API_KEY"
# Step 1: exchange API key for Bearer token
auth_response = requests.post(
f"{BASE_URL}/v1/ula/oauth-authentication",
params={"api_key": API_KEY},
timeout=30,
)
auth_response.raise_for_status()
token = auth_response.json()["access_token"]
headers = {
"Authorization": f"Bearer {token}",
"Content-Type": "application/json",
}
# Step 2: send the TTS request
response = requests.post(
f"{BASE_URL}/v1/ula/generate-audio",
headers=headers,
json={
"provider": "sarvam",
"model": "bulbul:v3",
"prompt": "Welcome to OneInfer. This audio was generated through the unified API.",
"voice_id": "shubh",
"format": "mp3",
"stream": False,
},
timeout=120,
)
response.raise_for_status()
payload = response.json()
audio = payload["data"]["audios"][0]
# Step 3: save the returned audio
if audio.get("base64_data"):
with open("oneinfer-tts.mp3", "wb") as f:
f.write(base64.b64decode(audio["base64_data"]))
print("Saved audio to oneinfer-tts.mp3")
elif audio.get("url"):
audio_bytes = requests.get(audio["url"], timeout=60)
audio_bytes.raise_for_status()
with open("oneinfer-tts.mp3", "wb") as f:
f.write(audio_bytes.content)
print("Downloaded audio to oneinfer-tts.mp3")
else:
raise ValueError("No audio data returned by API")What the Response Looks Like
json
{
"api_details": {
"api_status": "success",
"api_message": "API has returned response successfully."
},
"data": {
"id": "aud_12345abcde",
"provider": "sarvam",
"model": "bulbul:v3",
"audios": [
{
"url": "data:audio/mpeg;base64,...",
"format": "mp3",
"base64_data": "...",
"mime_type": "audio/mpeg"
}
]
},
"error": {}
}In most clients, the value you care about is data.audios[0]. If base64_data is present, decode it and write it to a file. If url is present, use it directly in an audio player or download it.
Practical Tips
- Use
format: "mp3"unless you explicitly need another format. - Keep prompts reasonably short for better latency and easier playback handling.
- If you need a valid
voice_id, check Get Supported Voice for Audio Models first. - Use
stream: trueonly when your client is ready to handle chunked audio bytes. - Always call
response.raise_for_status()or check HTTP status codes before reading the body.