sarvam

bulbul:v3

Context

2000

Input

text

Output

audio

Tool calling

Not listed

About this model

Bulbul v3 is a production-grade text-to-speech model optimized for 11 Indian languages. It features native support for code-mixed speech, professional voice cloning, and industry-leading stability in telephony environments (8 kHz). It automatically infers prosody, emphasis, and emotional tone.

Capabilities

text

Available through the unified API

audio

Available through the unified API

Quick start

View API docs

curl https://api.oneinfer.ai/v1/chat/completions \
  -H "Authorization: Bearer $ONEINFER_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "378cadf87af34176971cb0e62950b7c4",
    "messages": [
      { "role": "user", "content": "Hello!" }
    ]
  }'

Providers

Available routing options for this model through OneInfer.

sarvam

378cadf87af34176971cb0e62950b7c4

Available

Input

$0.000 / 1M

Output

$0.000 / 1M

Routing

OneInfer optimized

Pricing

Current OneInfer pricing for this model.

Usage	Price
Input tokens	$0.000 / 1M
Output tokens	$0.000 / 1M

Performance

Published evaluation results associated with this model.

Naturalness

Listener Preference (48kHz)63.14

Listener Preference (8kHz)77.95

Stability

Error Rate (%)8.6

Mispronunciation Rate (%)7.84

API example

curl https://api.oneinfer.ai/v1/chat/completions \
  -H "Authorization: Bearer $ONEINFER_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "378cadf87af34176971cb0e62950b7c4",
    "messages": [
      { "role": "user", "content": "Hello!" }
    ]
  }'