Text Generation

PDF Q&A Application

Let users upload any PDF and ask natural language questions about its contents. Combines document chunking, embeddings, and LLM inference on OneInfer.

OneInfer Chat APIOneInfer Embeddings APILangChainPythonFAISS

Architecture

1. Parse PDF + chunk text
2. Embed chunks via OneInfer
3. Query → retrieve → answer

Step-by-step guide

1

Install dependencies

bash
pip install openai langchain faiss-cpu pypdf
2

Parse and chunk your PDF

python
from langchain.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter

loader = PyPDFLoader("document.pdf")
pages = loader.load()

splitter = RecursiveCharacterTextSplitter(chunk_size=512, chunk_overlap=64)
chunks = splitter.split_documents(pages)
3

Embed chunks using OneInfer

python
import openai

client = openai.OpenAI(
    api_key="your-oneinfer-api-key",
    base_url="https://api.oneinfer.ai/v1"
)

def embed(texts: list[str]) -> list[list[float]]:
    response = client.embeddings.create(
        model="text-embedding-3-small",
        input=texts
    )
    return [d.embedding for d in response.data]

# Embed all chunks
texts = [c.page_content for c in chunks]
vectors = embed(texts)
4

Build a FAISS index and query it

python
import numpy as np
import faiss

dim = len(vectors[0])
index = faiss.IndexFlatL2(dim)
index.add(np.array(vectors, dtype="float32"))

def answer_question(question: str) -> str:
    q_vec = np.array(embed([question]), dtype="float32")
    _, indices = index.search(q_vec, k=4)

    context = "\n\n".join(texts[i] for i in indices[0])

    response = client.chat.completions.create(
        model="meta-llama/Llama-3.3-70B-Instruct-Turbo",
        messages=[
            {"role": "system", "content": "Answer questions based only on the provided context."},
            {"role": "user", "content": f"Context:\n{context}\n\nQuestion: {question}"}
        ]
    )
    return response.choices[0].message.content

print(answer_question("What are the key findings in chapter 3?"))