RAG Wizard

Testing & Evaluation

Use built-in tools to test chunking, similarity scores, and pipeline quality.

Before deploying your RAG system, use the built-in testing tools to verify retrieval quality and tune your configuration.

Chunk Review Tool

Preview how your documents are split into chunks and identify issues before processing.

How to Use

Navigate to your RAG project dashboard
Click Chunk Review in the sidebar
Select a document to preview
Browse chunks and check for:
- Content that spans chunk boundaries awkwardly
- Chunks that are too small or too large
- Missing context at chunk edges

What to Look For

Issue	Symptom	Fix
Chunks too large	Multiple topics in one chunk	Decrease `chunk_size`
Chunks too small	Incomplete sentences	Increase `chunk_size` or `chunk_min_size`
Broken sentences	Text cut mid-sentence	Enable sentence boundary respect
Lost context	Chunks feel disconnected	Increase `chunk_overlap`

Review at least 3-5 chunks from different documents before processing your full collection.

Similarity Score Test

Test how well your embedding model retrieves relevant chunks for specific queries.

How to Use

Navigate to Similarity Test in the sidebar
Enter a test query (use real questions your users will ask)
Click Test
Review results:

Query: "What is the refund policy for enterprise plans?"

Results:
  [0.92] "Enterprise customers may request a full refund within 30 days..."
  [0.87] "Refund processing takes 5-7 business days for enterprise accounts..."
  [0.74] "Standard refund policy applies to all subscription tiers..."
  [0.61] "Contact support for billing inquiries..."

Score Interpretation

Score Range	Quality	Action
0.85 - 1.00	Excellent	No action needed
0.70 - 0.84	Good	Acceptable for production
0.50 - 0.69	Fair	Tune chunking or embedding model
0.00 - 0.49	Poor	Review content quality and chunking

Improving Low Scores

Check chunk size — Large chunks dilute meaning; small chunks lose context
Try different embedding model — text-embedding-3-large may help for complex queries
Enable BM25 — Keyword matching catches exact terms semantic search misses
Review content quality — Ensure documents contain the information you're searching for

Pipeline Test Tool

End-to-end test of your full RAG pipeline, including retrieval and LLM answer generation.

How to Use

Navigate to Pipeline Test in the sidebar
Enter a test query
Click Run Pipeline
Review:
- Retrieved chunks and scores
- Processing time
- LLM answer (if enabled)
- Source citations

Metrics to Track

Metric	Target	Notes
Processing time	< 2s (chunks only)	LLM adds 1-3s
Top-1 score	> 0.70	Most relevant chunk
Top-3 recall	> 0.60	Relevant chunks in top 3
Answer accuracy	Manual review	Does LLM cite sources?

Query Testing Best Practices

Test with real user queries, not just simple keywords
Include edge cases and ambiguous questions
Test after every configuration change
Document test queries and expected results

Don't

Test with only one query
Assume high scores mean good answers
Skip testing after changing chunking settings
Deploy without manual review of answers

Iteration Workflow

Test with 5-10 representative queries
If scores are low, adjust chunking or embedding model
Re-test after each change
When all queries pass, deploy to production

← PreviousAPI Endpoints Next →Troubleshooting