
How queries are processed and results returned.
When a user submits a query, Simple RAG embeds it and searches the vector database for semantically similar chunks. Results are ranked, assembled into context, and optionally passed through an LLM for answer generation.
The user's query is converted to a vector using the same embedding model used during indexing.
Cosine similarity search finds the top-K most similar chunks.
| Scoring Method | Purpose |
|---|---|
| Relevance | Cosine similarity score |
| Diversity | Reduce duplicate information |
Combines top chunks into a coherent context window.
llmEnabled = true in wizard