RAG Knowledge Assistant - Tryfonas Papantoniou

Order-to-Cash knowledge assistant

Retriever

Ask a question about the fictional company's O2C policy. The assistant only answers from the 12 indexed documents.

How it works

01
Build-time indexing
Twelve markdown policy documents are chunked by section, sent to Voyage as 1024-dimensional embeddings, and written to a compact JSON index shipped with the build.
02
Query-time retrieval
Your question is embedded the same way, then the retriever ranks chunks by cosine similarity. The top four go into the prompt. A toggle swaps the in-memory backend for Pinecone: same interface, different store.
03
Grounded generation
Claude Haiku 4.5 answers using only the retrieved chunks, with an explicit instruction to refuse when the corpus doesn't cover the question. Responses stream token-by-token.
04
Inline citations
Every source chunk surfaces as a chip under the answer. Hovering shows the similarity score; clicking opens the full text so you can verify the model isn't making things up.