Endpoints

Chat Completions

POSThttps://api.oneinfer.ai/v1/ula/chat/completions

Generate a chat completion from a conversation history using any supported LLM provider. Supports both streaming and non-streaming responses via a unified interface compatible with OpenAI, Anthropic, DeepSeek, and more.

01Request Headers

Authorizationstringrequired

Bearer token for authentication. Format: Bearer <YOUR_TOKEN>. Exchange your API key for a token via the Authentication endpoint.

Content-Typestringrequired

Set to application/json for all chat completion requests.

02Request Parameters

Core Parameters

messagesarrayrequired

Array of message objects forming the conversation. Each object has a role ('system', 'user', 'assistant') and a content string.

providerstring

LLM provider to use. Defaults to "openai". Use the GET Models endpoint to retrieve available provider and model details.

modelstring

Model identifier to use (e.g. "gpt-4o-mini", "claude-sonnet-4-6"). Use the GET Models endpoint to retrieve available provider and model details.

Generation Controls

temperaturenumber

Sampling temperature from 0 to 2. Higher values produce more creative outputs. Default: 0.7.

max_tokensinteger

Maximum number of tokens to generate in the response. Default: 1000.

top_pnumber

Nucleus sampling parameter. Only tokens whose cumulative probability is within top_p are considered. Default: 1.0.

stopstring[]

One or more sequences at which to stop generation. Optional.

streamboolean

If true, the response is streamed as Server-Sent Events (SSE). Default: false.

Advanced

endpoint_idstring

ID of a dedicated serverless endpoint to route this request to. Optional — omit for standard routing.

service_tierstring

Execution priority tier. Options: "default" or "flex". Default: "default".

Example Request
// Chat completion with streaming enabled
{
  "provider": "openai",
  "model": "gpt-4o-mini",
  "messages": [
    {
      "role": "system",
      "content": "You are a helpful assistant."
    },
    {
      "role": "user",
      "content": "Tell me about AI."
    }
  ],
  "temperature": 0.7,
  "max_tokens": 1000,
  "stream": false
}

03Response

Response Fields

idstring

Unique request identifier.

createdinteger

Unix timestamp of when the completion was created.

textstring

The generated assistant message content.

finish_reasonstring

Why generation stopped. E.g. "stop", "length".

providerstring

The provider that handled the request.

modelstring

The model that generated the response.

usageobject

Token usage: prompt_tokens, completion_tokens, total_tokens.

latency_msnumber

End-to-end latency in milliseconds.

is_context_length_fullboolean

True if the context window was fully utilised.

application/json200 OK
{ "api_details": { "api_status": "success", "api_message": "Chat completion generated successfully." }, "data": { "id": "req_abc123", "created": 1711468800, "text": "AI (Artificial Intelligence) refers to systems that simulate human intelligence...", "finish_reason": "stop", "provider": "openai", "model": "gpt-4o-mini", "usage": { "prompt_tokens": 24, "completion_tokens": 156, "total_tokens": 180 }, "latency_ms": 1250, "is_context_length_full": false }, "error": {} }

04Streaming Responsestream: true

When stream: true, the server returns a stream of Server-Sent Events (SSE). Each event contains a JSON chunk with the incremental delta. The stream ends with data: [DONE].

text/event-stream
data: {"id":"chunk_1","text":"AI","finish_reason":null} data: {"id":"chunk_2","text":" (Artificial","finish_reason":null} data: {"id":"chunk_3","text":" Intelligence)","finish_reason":null} data: {"id":"chunk_4","text":" refers to systems...","finish_reason":null} data: {"id":"chunk_final","text":"","finish_reason":"stop"} data: [DONE]

Error Status Codes

CodeStatusDescription
200OKCompletion returned successfully.
400Bad RequestInvalid JSON body, missing required fields (provider/messages), unsupported provider, or invalid field values.
401UnauthorizedMissing or invalid Authorization header / Bearer token.
403ForbiddenInsufficient credit balance to process the request.
415Unsupported Media TypeContent-Type must be application/json or multipart/form-data.
422Unprocessable EntityRequest body failed schema validation.
500Internal Server ErrorUnexpected error during chat completion generation.

Response

202 - application/json