Logo IconGuided Mind
v2.4Sign In

Pipeline Configuration

Configure embedding model, search method, and retrieval settings in Step 4 of the wizard.

The Pipeline Configuration step sets up how your chunks are embedded, stored, and retrieved. This is the core of your RAG system.

Embedding Model

The embedding model converts text chunks into numerical vectors for similarity search.

ModelDimensionsContextSpeedCostBest For
text-embedding-3-small15368192 tokensFastLowGeneral purpose, large scale
text-embedding-3-large30728192 tokensMediumMediumHigh precision, critical apps
text-embedding-ada-00215368192 tokensMediumMediumLegacy compatibility
all-MiniLM-L6-v2384512 tokensVery fastFreeLocal, resource-constrained
all-mpnet-base-v2768512 tokensFastFreeBalanced quality/speed

Start with text-embedding-3-small for most use cases. It offers the best balance of quality, speed, and cost. Switch to text-embedding-3-large only if you need higher precision.

Search Method

MethodDescriptionWhen to Use
DenseVector similarity (semantic meaning)Conceptual questions, natural language
SparseBM25 keyword matchingExact terms, product codes, names
HybridDense + Sparse combinedDefault — best for most queries
GraphKnowledge graph traversalEntity relationships, multi-hop reasoning

Start with Hybrid. It consistently outperforms single-method search across diverse query types. Switch to Dense or Sparse only after profiling your query distribution.

Similarity Function

FunctionDescriptionBest For
CosineAngle between vectors (normalized)Default — works well for all models
EuclideanStraight-line distanceWhen magnitude matters
Dot ProductRaw vector multiplicationFastest computation

Enable BM25 to combine semantic search with keyword matching.

Benefits:

  • Catches exact term matches that semantic search might miss
  • Better for proper nouns, product codes, and technical terms
  • No additional cost

Config:

{
  "embedding": {
    "enable_bm25": true,
    "bm25_weight": 0.3
  }
}

LLM Integration

Choose whether to generate answers from retrieved chunks.

OptionDescription
Chunks onlyReturn raw matching chunks (no LLM call)
LLM answerGenerate natural language answer from chunks

When LLM answer is enabled:

  • Retrieved chunks are sent as context to the LLM
  • Response includes both the answer and source chunks
  • Adds latency and cost per query

LLM answers add ~1-3 seconds per query and incur LLM API costs. Start with "Chunks only" to verify retrieval quality before enabling.

Pipeline Test Tool

Before deploying, use the built-in test tool to verify your pipeline:

  1. Enter a test query in the search box
  2. Click Test Pipeline
  3. Review results:
    • Matching chunks with similarity scores
    • Processing time
    • Search method used
    • LLM answer (if enabled)

What to check:

  • Top results are relevant to the query
  • Similarity scores are above 0.70
  • Processing time is acceptable (< 2s for chunks only)
  • LLM answer is accurate and cites sources

Query Template

Customize how queries are processed. The default template works for most cases:

Find information that answers: {query}

For specialized use cases:

# Technical support
Find troubleshooting steps or solutions for: {query}

# Legal research
Find relevant clauses, sections, or precedents for: {query}

# Product search
Find product features, specifications, or comparisons for: {query}

Full Pipeline Config Example

{
  "embedding": {
    "model": "text-embedding-3-small",
    "enable_bm25": true,
    "similarity": "cosine"
  },
  "retrieval": {
    "search_method": "hybrid",
    "limit": 5,
    "threshold": 0.7
  },
  "llm": {
    "enabled": false
  }
}

Next Step

After configuring your pipeline, move to API Endpoints to get your API key and start integrating.

Do
  • Test with real queries before deploying
  • Start with Hybrid search method
  • Use text-embedding-3-small for most cases
  • Enable BM25 for better keyword matching
Don't
  • Deploy without testing the pipeline
  • Enable LLM answers before verifying retrieval
  • Use cosine for everything — test other functions
  • Set threshold too high (misses valid results)