Logo IconGuided Mind
v2.4Sign In
RAG Wizard

Testing & Evaluation

Use built-in tools to test chunking, similarity scores, and pipeline quality.

Before deploying your RAG system, use the built-in testing tools to verify retrieval quality and tune your configuration.

Chunk Review Tool

Preview how your documents are split into chunks and identify issues before processing.

How to Use

  1. Navigate to your RAG project dashboard
  2. Click Chunk Review in the sidebar
  3. Select a document to preview
  4. Browse chunks and check for:
    • Content that spans chunk boundaries awkwardly
    • Chunks that are too small or too large
    • Missing context at chunk edges

What to Look For

IssueSymptomFix
Chunks too largeMultiple topics in one chunkDecrease chunk_size
Chunks too smallIncomplete sentencesIncrease chunk_size or chunk_min_size
Broken sentencesText cut mid-sentenceEnable sentence boundary respect
Lost contextChunks feel disconnectedIncrease chunk_overlap

Review at least 3-5 chunks from different documents before processing your full collection.

Similarity Score Test

Test how well your embedding model retrieves relevant chunks for specific queries.

How to Use

  1. Navigate to Similarity Test in the sidebar
  2. Enter a test query (use real questions your users will ask)
  3. Click Test
  4. Review results:
Query: "What is the refund policy for enterprise plans?"

Results:
  [0.92] "Enterprise customers may request a full refund within 30 days..."
  [0.87] "Refund processing takes 5-7 business days for enterprise accounts..."
  [0.74] "Standard refund policy applies to all subscription tiers..."
  [0.61] "Contact support for billing inquiries..."

Score Interpretation

Score RangeQualityAction
0.85 - 1.00ExcellentNo action needed
0.70 - 0.84GoodAcceptable for production
0.50 - 0.69FairTune chunking or embedding model
0.00 - 0.49PoorReview content quality and chunking

Improving Low Scores

  1. Check chunk size — Large chunks dilute meaning; small chunks lose context
  2. Try different embedding modeltext-embedding-3-large may help for complex queries
  3. Enable BM25 — Keyword matching catches exact terms semantic search misses
  4. Review content quality — Ensure documents contain the information you're searching for

Pipeline Test Tool

End-to-end test of your full RAG pipeline, including retrieval and LLM answer generation.

How to Use

  1. Navigate to Pipeline Test in the sidebar
  2. Enter a test query
  3. Click Run Pipeline
  4. Review:
    • Retrieved chunks and scores
    • Processing time
    • LLM answer (if enabled)
    • Source citations

Metrics to Track

MetricTargetNotes
Processing time< 2s (chunks only)LLM adds 1-3s
Top-1 score> 0.70Most relevant chunk
Top-3 recall> 0.60Relevant chunks in top 3
Answer accuracyManual reviewDoes LLM cite sources?

Query Testing Best Practices

Do
  • Test with real user queries, not just simple keywords
  • Include edge cases and ambiguous questions
  • Test after every configuration change
  • Document test queries and expected results
Don't
  • Test with only one query
  • Assume high scores mean good answers
  • Skip testing after changing chunking settings
  • Deploy without manual review of answers

Iteration Workflow

  1. Test with 5-10 representative queries
  2. If scores are low, adjust chunking or embedding model
  3. Re-test after each change
  4. When all queries pass, deploy to production