Pipeline Configuration

Configure embedding model, search method, and retrieval settings in Step 4 of the wizard.

The Pipeline Configuration step sets up how your chunks are embedded, stored, and retrieved. This is the core of your RAG system.

Embedding Model

The embedding model converts text chunks into numerical vectors for similarity search.

Model	Dimensions	Context	Speed	Cost	Best For
text-embedding-3-small	1536	8192 tokens	Fast	Low	General purpose, large scale
text-embedding-3-large	3072	8192 tokens	Medium	Medium	High precision, critical apps
text-embedding-ada-002	1536	8192 tokens	Medium	Medium	Legacy compatibility
all-MiniLM-L6-v2	384	512 tokens	Very fast	Free	Local, resource-constrained
all-mpnet-base-v2	768	512 tokens	Fast	Free	Balanced quality/speed

Start with text-embedding-3-small for most use cases. It offers the best balance of quality, speed, and cost. Switch to text-embedding-3-large only if you need higher precision.

Search Method

Method	Description	When to Use
Dense	Vector similarity (semantic meaning)	Conceptual questions, natural language
Sparse	BM25 keyword matching	Exact terms, product codes, names
Hybrid	Dense + Sparse combined	Default — best for most queries
Graph	Knowledge graph traversal	Entity relationships, multi-hop reasoning

Start with Hybrid. It consistently outperforms single-method search across diverse query types. Switch to Dense or Sparse only after profiling your query distribution.

Similarity Function

Function	Description	Best For
Cosine	Angle between vectors (normalized)	Default — works well for all models
Euclidean	Straight-line distance	When magnitude matters
Dot Product	Raw vector multiplication	Fastest computation

BM25 Hybrid Search

Enable BM25 to combine semantic search with keyword matching.

Benefits:

Catches exact term matches that semantic search might miss
Better for proper nouns, product codes, and technical terms
No additional cost

Config:

{
  "embedding": {
    "enable_bm25": true,
    "bm25_weight": 0.3
  }
}

LLM Integration

Choose whether to generate answers from retrieved chunks.

Option	Description
Chunks only	Return raw matching chunks (no LLM call)
LLM answer	Generate natural language answer from chunks

When LLM answer is enabled:

Retrieved chunks are sent as context to the LLM
Response includes both the answer and source chunks
Adds latency and cost per query

LLM answers add ~1-3 seconds per query and incur LLM API costs. Start with "Chunks only" to verify retrieval quality before enabling.

Pipeline Test Tool

Before deploying, use the built-in test tool to verify your pipeline:

Enter a test query in the search box
Click Test Pipeline
Review results:
- Matching chunks with similarity scores
- Processing time
- Search method used
- LLM answer (if enabled)

What to check:

Top results are relevant to the query
Similarity scores are above 0.70
Processing time is acceptable (< 2s for chunks only)
LLM answer is accurate and cites sources

Query Template

Customize how queries are processed. The default template works for most cases:

Find information that answers: {query}

For specialized use cases:

# Technical support
Find troubleshooting steps or solutions for: {query}

# Legal research
Find relevant clauses, sections, or precedents for: {query}

# Product search
Find product features, specifications, or comparisons for: {query}

Full Pipeline Config Example

{
  "embedding": {
    "model": "text-embedding-3-small",
    "enable_bm25": true,
    "similarity": "cosine"
  },
  "retrieval": {
    "search_method": "hybrid",
    "limit": 5,
    "threshold": 0.7
  },
  "llm": {
    "enabled": false
  }
}

Next Step

After configuring your pipeline, move to API Endpoints to get your API key and start integrating.

Test with real queries before deploying
Start with Hybrid search method
Use text-embedding-3-small for most cases
Enable BM25 for better keyword matching

Don't

Deploy without testing the pipeline
Enable LLM answers before verifying retrieval
Use cosine for everything — test other functions
Set threshold too high (misses valid results)

Next →Introduction