Configure embeddings, retrieval methods, search methods, and RAG pipeline parameters.
The pipeline configuration step determines how your documents are embedded, searched, and retrieved. This is the most critical step for retrieval quality.
Embedding Settings
Embedding settings control how your document text is converted into numerical vectors. The embedding model transforms text into points in multi-dimensional space, where semantically similar content ends up closer together. Your choice of model directly impacts search accuracy, speed, and cost.
Setting
What It Does
Embedding Model
Selects the AI model that converts text into vector representations for similarity search
Dimensions
Vector size (auto-selected based on the chosen model)
Similarity Function
Determines how similarity between vectors is calculated
Available Embedding Models
Model
Dimensions
Context
Best For
all-MiniLM-L6-v2
384
256 tokens
Fast prototyping and lightweight use
all-mpnet-base-v2
768
514 tokens
Balanced quality and speed
Jina-Embeddings-v2-Base-EN
768
8192 tokens
Long documents
BGE-Large-EN-v1.5
1024
512 tokens
Enterprise-grade performance
E5-Large-v2
1024
512 tokens
High-quality English retrieval
BGE-M3
1024
8192 tokens
Multilingual + long context
Stella-EN-1.5B-v5
1024
512 tokens
State-of-the-art performance (NovaSearch)
Some embedding models are only available on higher-tier plans. Check your plan details or contact sales for model availability.
Similarity Functions
Function
Description
Cosine
Measures angle between vectors (recommended for most cases)
Euclidean
Measures straight-line distance between vectors
Dot Product
Measures alignment between vectors
Manhattan
Measures distance using grid-like paths
Retrieval Method
Retrieval method determines how the chunks found by your search are assembled into the context window sent to the LLM. Different retrieval methods provide varying levels of context richness — from simple raw chunk content to LLM-enhanced summaries that include surrounding document structure and extracted entities.
Setting
What It Does
Retrieval Method
Determines how retrieved chunks are assembled into context for the LLM
Custom Document Template
Setting
What It Does
Document Template
Template string that controls how chunk content and metadata are formatted
Available Tags:{content}, {author}, {title}, {created_date}, {source}, {chunk_index}
Contextual Retrieval
Setting
What It Does
Contextual Retrieval Template
Template for LLM-enhanced context assembly
LLM Model
Selects the LLM used to enhance chunk context with surrounding document information
Query templates allow you to reformat user queries before they are embedded and searched. This is useful for optimizing how different types of questions are processed — for example, wrapping queries in "Question: ..." format can improve results for Q&A use cases.
Setting
What It Does
Query Template
Template for processing user queries before searching
Template Presets
Preset
Template
Use Case
Question Answering
Question: {query}\nAnswer:
Q&A chatbots
Code Search
Find code related to: {query}
Code repositories
Keyword Search
{query}
Direct keyword matching
Search Method
Search method defines how your system finds relevant chunks when a query is made. Dense (vector) search finds semantically similar content, while sparse (BM25) search matches exact keywords. Hybrid search combines both approaches for the best of both worlds — catching both conceptual matches and exact keyword hits.
Setting
What It Does
Search Method
Selects the search strategy for finding relevant chunks
Controls how much BM25 scores influence final rankings (0-1)
Hybrid Rerank Method
Determines how vector and BM25 results are combined
Search Methods
Method
Description
Dense
Pure vector (embedding) search
Sparse
BM25 keyword search only
Hybrid
Vector + BM25 with reranking
Graph
Vector + knowledge graph traversal
Reranking Methods
Method
Description
RRF
Reciprocal Rank Fusion (recommended)
Reciprocal Rank Fusion
Full RRF with configurable window
Weighted Sum
Weighted combination of scores
RAG Pipeline Settings
Pipeline settings control the final stages of query processing — how many results are returned, what quality threshold they must meet, and how the retrieved context is assembled before being sent to the LLM. These settings balance result quantity against quality and control whether your system returns raw search results or generates natural language answers.
Setting
What It Does
Top K
Number of results to return per query
Score Threshold
Minimum similarity score — results below this are filtered out
Context Assembly
Strategy for ordering and combining retrieved chunks
Max Context Length
Maximum number of tokens in the assembled context
LLM Integration
Enables LLM-powered answer generation (requires LLM model integration)
LLM Model
Selects the LLM model for generating answers
Temperature
Controls LLM creativity (0 = deterministic, 1 = more creative)
Context Assembly Methods
Method
Description
Ranked
Assembles chunks by relevance score (recommended)
Sequential
Assembles chunks in original document order
Weighted
Combines relevance score and document position
Start with defaults and adjust top_k and score_threshold based on your evaluation results. Higher thresholds reduce noise but may miss relevant chunks.