Integration Patterns
Retrieval-Augmented Generation
Inject relevant context from your own knowledge base into the system prompt before sending to the LLM. This grounds the model's answer in your data.
Pattern overview
- Embed the user query using an embedding model.
- Retrieve the top-k relevant chunks from your vector store.
- Concatenate the chunks into a context block in the system message.
- Send the conversation to OneInfer and return the response.
Python (simplified)
python
import requests
BASE_URL = "https://api.oneinfer.ai"
token = requests.post(
f"{BASE_URL}/v1/ula/oauth-authentication?api_key=YOUR_API_KEY"
).json()["access_token"]
headers = {"Authorization": f"Bearer {token}", "Content-Type": "application/json"}
def retrieve_context(query: str) -> str:
# Replace with your actual vector store lookup
return "OneInfer supports OpenAI, Anthropic, and Novita providers."
def rag_answer(user_query: str) -> str:
context = retrieve_context(user_query)
system_prompt = (
"You are a helpful assistant. Use the following context to answer the user's question. "
"If the context does not contain the answer, say you don't know.\n\n"
f"Context:\n{context}"
)
response = requests.post(
f"{BASE_URL}/v1/ula/chat/completions",
headers=headers,
json={
"provider": "openai",
"model": "gpt-4o-mini",
"messages": [
{"role": "system", "content": system_prompt},
{"role": "user", "content": user_query},
],
"max_tokens": 512,
"temperature": 0.3,
},
)
return response.json()["data"]["text"]
print(rag_answer("Which providers does OneInfer support?"))