sarvam
bulbul:v3
Context
2000
Input
text
Output
audio
Tool calling
Not listed
About this model
Bulbul v3 is a production-grade text-to-speech model optimized for 11 Indian languages. It features native support for code-mixed speech, professional voice cloning, and industry-leading stability in telephony environments (8 kHz). It automatically infers prosody, emphasis, and emotional tone.
Capabilities
text
Available through the unified API
audio
Available through the unified API
Quick start
View API docscurl https://api.oneinfer.ai/v1/chat/completions \
-H "Authorization: Bearer $ONEINFER_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "378cadf87af34176971cb0e62950b7c4",
"messages": [
{ "role": "user", "content": "Hello!" }
]
}'Providers
Available routing options for this model through OneInfer.
sarvam
378cadf87af34176971cb0e62950b7c4
Input
$0.000 / 1M
Output
$0.000 / 1M
Routing
OneInfer optimized
Pricing
Current OneInfer pricing for this model.
| Usage | Price |
|---|---|
| Input tokens | $0.000 / 1M |
| Output tokens | $0.000 / 1M |
Performance
Published evaluation results associated with this model.
Naturalness
Listener Preference (48kHz)63.14
Listener Preference (8kHz)77.95
Stability
Error Rate (%)8.6
Mispronunciation Rate (%)7.84
API example
curl https://api.oneinfer.ai/v1/chat/completions \
-H "Authorization: Bearer $ONEINFER_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "378cadf87af34176971cb0e62950b7c4",
"messages": [
{ "role": "user", "content": "Hello!" }
]
}'