RAG Wizard

Step 5 — Pipeline Configuration

Configure embeddings, retrieval methods, search methods, and RAG pipeline parameters.

The pipeline configuration step determines how your documents are embedded, searched, and retrieved. This is the most critical step for retrieval quality.

Embedding Settings

Embedding settings control how your document text is converted into numerical vectors. The embedding model transforms text into points in multi-dimensional space, where semantically similar content ends up closer together. Your choice of model directly impacts search accuracy, speed, and cost.

Setting	What It Does
Embedding Model	Selects the AI model that converts text into vector representations for similarity search
Dimensions	Vector size (auto-selected based on the chosen model)
Similarity Function	Determines how similarity between vectors is calculated

Available Embedding Models

Model	Dimensions	Context	Best For
all-MiniLM-L6-v2	384	256 tokens	Fast prototyping and lightweight use
all-mpnet-base-v2	768	514 tokens	Balanced quality and speed
Jina-Embeddings-v2-Base-EN	768	8192 tokens	Long documents
BGE-Large-EN-v1.5	1024	512 tokens	Enterprise-grade performance
E5-Large-v2	1024	512 tokens	High-quality English retrieval
BGE-M3	1024	8192 tokens	Multilingual + long context
Stella-EN-1.5B-v5	1024	512 tokens	State-of-the-art performance (NovaSearch)

Some embedding models are only available on higher-tier plans. Check your plan details or contact sales for model availability.

Similarity Functions

Function	Description
Cosine	Measures angle between vectors (recommended for most cases)
Euclidean	Measures straight-line distance between vectors
Dot Product	Measures alignment between vectors
Manhattan	Measures distance using grid-like paths

Retrieval Method

Retrieval method determines how the chunks found by your search are assembled into the context window sent to the LLM. Different retrieval methods provide varying levels of context richness — from simple raw chunk content to LLM-enhanced summaries that include surrounding document structure and extracted entities.

Setting	What It Does
Retrieval Method	Determines how retrieved chunks are assembled into context for the LLM

Custom Document Template

Setting	What It Does
Document Template	Template string that controls how chunk content and metadata are formatted

Available Tags: {content}, {author}, {title}, {created_date}, {source}, {chunk_index}

Contextual Retrieval

Setting	What It Does
Contextual Retrieval Template	Template for LLM-enhanced context assembly
LLM Model	Selects the LLM used to enhance chunk context with surrounding document information

Default Template: Context: {full_document}\n\nChunk: {chunk_context} Available Tags: {full_document}, {chunk_context}

ML-Optimized Contextual Retrieval

Setting	What It Does
ML Contextual Retrieval Template	Template for ML-enhanced context with summaries and entity extraction
LLM Model	Optional LLM for additional context enhancement

Default Template: Document Summary: {full_document_summary}\nSection: {parent_section_summary}\n\nChunk: {chunk_context} Available Tags: {chunk_context}, {full_document_summary}, {parent_section_summary}, {entities}, {topics}, {sentiment}, {key_phrases}

Query Template

Query templates allow you to reformat user queries before they are embedded and searched. This is useful for optimizing how different types of questions are processed — for example, wrapping queries in "Question: ..." format can improve results for Q&A use cases.

Setting	What It Does
Query Template	Template for processing user queries before searching

Template Presets

Preset	Template	Use Case
Question Answering	`Question: {query}\nAnswer:`	Q&A chatbots
Code Search	`Find code related to: {query}`	Code repositories
Keyword Search	`{query}`	Direct keyword matching

Search Method

Search method defines how your system finds relevant chunks when a query is made. Dense (vector) search finds semantically similar content, while sparse (BM25) search matches exact keywords. Hybrid search combines both approaches for the best of both worlds — catching both conceptual matches and exact keyword hits.

Setting	What It Does
Search Method	Selects the search strategy for finding relevant chunks
Enable BM25	Adds keyword-based BM25 search alongside vector search
BM25 Weight	Controls how much BM25 scores influence final rankings (0-1)
Hybrid Rerank Method	Determines how vector and BM25 results are combined

Search Methods

Method	Description
Dense	Pure vector (embedding) search
Sparse	BM25 keyword search only
Hybrid	Vector + BM25 with reranking
Graph	Vector + knowledge graph traversal

Reranking Methods

Method	Description
RRF	Reciprocal Rank Fusion (recommended)
Reciprocal Rank Fusion	Full RRF with configurable window
Weighted Sum	Weighted combination of scores

RAG Pipeline Settings

Pipeline settings control the final stages of query processing — how many results are returned, what quality threshold they must meet, and how the retrieved context is assembled before being sent to the LLM. These settings balance result quantity against quality and control whether your system returns raw search results or generates natural language answers.

Setting	What It Does
Top K	Number of results to return per query
Score Threshold	Minimum similarity score — results below this are filtered out
Context Assembly	Strategy for ordering and combining retrieved chunks
Max Context Length	Maximum number of tokens in the assembled context
LLM Integration	Enables LLM-powered answer generation (requires LLM model integration)
LLM Model	Selects the LLM model for generating answers
Temperature	Controls LLM creativity (0 = deterministic, 1 = more creative)

Context Assembly Methods

Method	Description
Ranked	Assembles chunks by relevance score (recommended)
Sequential	Assembles chunks in original document order
Weighted	Combines relevance score and document position

Start with defaults and adjust top_k and score_threshold based on your evaluation results. Higher thresholds reduce noise but may miss relevant chunks.

← PreviousGraph Editor Next →API Setup