RAG Knowledge Assistant
Ask a question about a fictional enterprise's order-to-cash policy. Answers come from 12 indexed documents covering credit limits, dunning cadences, dispute resolution, and bad-debt provisioning. Every claim is cited back to the source chunk it came from.
Embeddings: Voyage AI · voyage-3Retriever: pluggable (in-memory + Pinecone)Answer model: Claude Haiku 4.5Runtime: Next.js · Vercel
Order-to-Cash knowledge assistant
Ask a question about the fictional company's O2C policy. The assistant only answers from the 12 indexed documents.
How it works
- 01Build-time indexingTwelve markdown policy documents are chunked by section, sent to Voyage as 1024-dimensional embeddings, and written to a compact JSON index shipped with the build.
- 02Query-time retrievalYour question is embedded the same way, then the retriever ranks chunks by cosine similarity. The top four go into the prompt. A toggle swaps the in-memory backend for Pinecone: same interface, different store.
- 03Grounded generationClaude Haiku 4.5 answers using only the retrieved chunks, with an explicit instruction to refuse when the corpus doesn't cover the question. Responses stream token-by-token.
- 04Inline citationsEvery source chunk surfaces as a chip under the answer. Hovering shows the similarity score; clicking opens the full text so you can verify the model isn't making things up.