
Configure embedding model, search method, and retrieval settings in Step 4 of the wizard.
The Pipeline Configuration step sets up how your chunks are embedded, stored, and retrieved. This is the core of your RAG system.
The embedding model converts text chunks into numerical vectors for similarity search.
| Model | Dimensions | Context | Speed | Cost | Best For |
|---|---|---|---|---|---|
| text-embedding-3-small | 1536 | 8192 tokens | Fast | Low | General purpose, large scale |
| text-embedding-3-large | 3072 | 8192 tokens | Medium | Medium | High precision, critical apps |
| text-embedding-ada-002 | 1536 | 8192 tokens | Medium | Medium | Legacy compatibility |
| all-MiniLM-L6-v2 | 384 | 512 tokens | Very fast | Free | Local, resource-constrained |
| all-mpnet-base-v2 | 768 | 512 tokens | Fast | Free | Balanced quality/speed |
Start with text-embedding-3-small for most use cases. It offers the best balance of quality, speed, and cost. Switch to text-embedding-3-large only if you need higher precision.
| Method | Description | When to Use |
|---|---|---|
| Dense | Vector similarity (semantic meaning) | Conceptual questions, natural language |
| Sparse | BM25 keyword matching | Exact terms, product codes, names |
| Hybrid | Dense + Sparse combined | Default — best for most queries |
| Graph | Knowledge graph traversal | Entity relationships, multi-hop reasoning |
Start with Hybrid. It consistently outperforms single-method search across diverse query types. Switch to Dense or Sparse only after profiling your query distribution.
| Function | Description | Best For |
|---|---|---|
| Cosine | Angle between vectors (normalized) | Default — works well for all models |
| Euclidean | Straight-line distance | When magnitude matters |
| Dot Product | Raw vector multiplication | Fastest computation |
Enable BM25 to combine semantic search with keyword matching.
Benefits:
Config:
{
"embedding": {
"enable_bm25": true,
"bm25_weight": 0.3
}
}Choose whether to generate answers from retrieved chunks.
| Option | Description |
|---|---|
| Chunks only | Return raw matching chunks (no LLM call) |
| LLM answer | Generate natural language answer from chunks |
When LLM answer is enabled:
LLM answers add ~1-3 seconds per query and incur LLM API costs. Start with "Chunks only" to verify retrieval quality before enabling.
Before deploying, use the built-in test tool to verify your pipeline:
What to check:
Customize how queries are processed. The default template works for most cases:
Find information that answers: {query}
For specialized use cases:
# Technical support
Find troubleshooting steps or solutions for: {query}
# Legal research
Find relevant clauses, sections, or precedents for: {query}
# Product search
Find product features, specifications, or comparisons for: {query}
{
"embedding": {
"model": "text-embedding-3-small",
"enable_bm25": true,
"similarity": "cosine"
},
"retrieval": {
"search_method": "hybrid",
"limit": 5,
"threshold": 0.7
},
"llm": {
"enabled": false
}
}After configuring your pipeline, move to API Endpoints to get your API key and start integrating.