Logo IconGuided Mind
v2.4Sign In
RAG Types

Simple RAG — Indexing Pipeline

How documents are chunked and embedded for vector search.

Overview

Simple RAG indexing converts raw documents into searchable vector embeddings through a linear pipeline: parse → preprocess → chunk → embed → store.

Pipeline Steps

1. Document Parser

Extracts raw text from uploaded files (PDF, TXT, MD, CSV, DOCX, XLS).

2. Text Preprocessing (Optional Settings)

SettingEffectDefault
Remove Non-ASCIIStrips special charactersfalse
LowercaseNormalizes text casefalse
Collapse SpacesRemoves extra whitespacefalse

3. Chunking Strategy

StrategyWhen to Use
Fixed SizeGeneral purpose, predictable chunks
RecursivePreserves paragraph structure
SemanticLLM-determined natural boundaries

4. Embedding Model

Converts each chunk into a dense vector. Model selected in wizard (e.g., text-embedding-3-small).

5. Vector Database

Stores embeddings with chunk metadata for fast similarity search.