
Use built-in tools to test chunking, similarity scores, and pipeline quality.
Before deploying your RAG system, use the built-in testing tools to verify retrieval quality and tune your configuration.
Preview how your documents are split into chunks and identify issues before processing.
| Issue | Symptom | Fix |
|---|---|---|
| Chunks too large | Multiple topics in one chunk | Decrease chunk_size |
| Chunks too small | Incomplete sentences | Increase chunk_size or chunk_min_size |
| Broken sentences | Text cut mid-sentence | Enable sentence boundary respect |
| Lost context | Chunks feel disconnected | Increase chunk_overlap |
Review at least 3-5 chunks from different documents before processing your full collection.
Test how well your embedding model retrieves relevant chunks for specific queries.
Query: "What is the refund policy for enterprise plans?"
Results:
[0.92] "Enterprise customers may request a full refund within 30 days..."
[0.87] "Refund processing takes 5-7 business days for enterprise accounts..."
[0.74] "Standard refund policy applies to all subscription tiers..."
[0.61] "Contact support for billing inquiries..."
| Score Range | Quality | Action |
|---|---|---|
| 0.85 - 1.00 | Excellent | No action needed |
| 0.70 - 0.84 | Good | Acceptable for production |
| 0.50 - 0.69 | Fair | Tune chunking or embedding model |
| 0.00 - 0.49 | Poor | Review content quality and chunking |
text-embedding-3-large may help for complex queriesEnd-to-end test of your full RAG pipeline, including retrieval and LLM answer generation.
| Metric | Target | Notes |
|---|---|---|
| Processing time | < 2s (chunks only) | LLM adds 1-3s |
| Top-1 score | > 0.70 | Most relevant chunk |
| Top-3 recall | > 0.60 | Relevant chunks in top 3 |
| Answer accuracy | Manual review | Does LLM cite sources? |