RAG Types

Hybrid RAG — Retrieval Pipeline

How queries are processed with dual vector + BM25 search.

Overview

Hybrid RAG retrieval executes two parallel searches (vector + BM25), then combines the results using a reranking strategy before returning the final ranked list.

Pipeline Steps

1. Query Embedding

The query is embedded using the same model as indexing for vector search.

2. BM25 Query Processing

The query is tokenized for keyword matching against the BM25 inverted index.

3. Dual Search

Vector Search: Finds semantically similar chunks
BM25 Search: Finds chunks with exact keyword matches

4. Hybrid Reranking

Method	Description
RRF (Reciprocal Rank Fusion)	Combines ranks from both searches (recommended)
Weighted Sum	Weighted combination of vector and BM25 scores

5. Result Ranking

Final ranking based on the reranking method output.

6. Context Assembly

Combines top chunks into context, respecting max_context_length.

7. Response Generation (Optional)

Activated when: llmEnabled = true
What it does: Passes assembled context + query to LLM

Key Differences from Simple RAG

Two parallel searches: Vector + BM25
Hybrid Reranking step: Combines results using RRF or Weighted Sum
bm25_weight setting: Controls BM25 influence (0.0-1.0)

← PreviousHybrid RAG: Indexing Next →Graph RAG