Chat Completions
https://api.oneinfer.ai/v1/ula/chat/completionsGenerate a chat completion from a conversation history using any supported LLM provider. Supports both streaming and non-streaming responses via a unified interface compatible with OpenAI, Anthropic, DeepSeek, and more.
01Request Headers
Bearer token for authentication. Format: Bearer <YOUR_TOKEN>. Exchange your API key for a token via the Authentication endpoint.
Set to application/json for all chat completion requests.
02Request Parameters
Core Parameters
Array of message objects forming the conversation. Each object has a role ('system', 'user', 'assistant') and a content string.
LLM provider to use. Defaults to "openai". Use the GET Models endpoint to retrieve available provider and model details.
Model identifier to use (e.g. "gpt-4o-mini", "claude-sonnet-4-6"). Use the GET Models endpoint to retrieve available provider and model details.
Generation Controls
Sampling temperature from 0 to 2. Higher values produce more creative outputs. Default: 0.7.
Maximum number of tokens to generate in the response. Default: 1000.
Nucleus sampling parameter. Only tokens whose cumulative probability is within top_p are considered. Default: 1.0.
One or more sequences at which to stop generation. Optional.
If true, the response is streamed as Server-Sent Events (SSE). Default: false.
Advanced
ID of a dedicated serverless endpoint to route this request to. Optional — omit for standard routing.
Execution priority tier. Options: "default" or "flex". Default: "default".
{
"provider": "openai",
"model": "gpt-4o-mini",
"messages": [
{
"role": "system",
"content": "You are a helpful assistant."
},
{
"role": "user",
"content": "Tell me about AI."
}
],
"temperature": 0.7,
"max_tokens": 1000,
"stream": false
}03Response
Response Fields
Unique request identifier.
Unix timestamp of when the completion was created.
The generated assistant message content.
Why generation stopped. E.g. "stop", "length".
The provider that handled the request.
The model that generated the response.
Token usage: prompt_tokens, completion_tokens, total_tokens.
End-to-end latency in milliseconds.
True if the context window was fully utilised.
04Streaming Responsestream: true
When stream: true, the server returns a stream of Server-Sent Events (SSE). Each event contains a JSON chunk with the incremental delta. The stream ends with data: [DONE].
Error Status Codes
| Code | Status | Description |
|---|---|---|
| 200 | OK | Completion returned successfully. |
| 400 | Bad Request | Invalid JSON body, missing required fields (provider/messages), unsupported provider, or invalid field values. |
| 401 | Unauthorized | Missing or invalid Authorization header / Bearer token. |
| 403 | Forbidden | Insufficient credit balance to process the request. |
| 415 | Unsupported Media Type | Content-Type must be application/json or multipart/form-data. |
| 422 | Unprocessable Entity | Request body failed schema validation. |
| 500 | Internal Server Error | Unexpected error during chat completion generation. |