RAG Types

Simple RAG — Indexing Pipeline

How documents are chunked and embedded for vector search.

Overview

Simple RAG indexing converts raw documents into searchable vector embeddings through a linear pipeline: parse → preprocess → chunk → embed → store.

Extracts raw text from uploaded files (PDF, TXT, MD, CSV, DOCX, XLS).

Strategy	When to Use
Fixed Size	General purpose, predictable chunks
Recursive	Preserves paragraph structure
Semantic	LLM-determined natural boundaries

Converts each chunk into a dense vector. Model selected in wizard (e.g., text-embedding-3-small).

Stores embeddings with chunk metadata for fast similarity search.