Data Sources

Upload and organize documents for your RAG system in Step 2 of the wizard.

The Data Sources step is where you upload the documents that power your RAG system. You can upload files directly in the wizard or use the API for programmatic uploads.

Supported File Formats

Format	Extension	Max Size	Notes
PDF	`.pdf`	25MB	Text-extractable; OCR available for scanned
Plain Text	`.txt`	10MB	UTF-8 encoding recommended
Markdown	`.md`	10MB	Preserves headers and formatting
Word	`.docx`	25MB	Extracts text and basic structure
Excel	`.xlsx`, `.xls`	25MB	Each sheet processed separately
CSV	`.csv`	10MB	Rows become individual chunks
HTML	`.html`, `.htm`	10MB	Tags stripped, content extracted
JSON	`.json`	10MB	Must be valid JSON structure

Scanned PDFs require OCR processing, which increases processing time. Enable OCR in Document Processing step if your PDFs contain images of text.

Upload Methods

Dashboard Upload (Wizard)

Navigate to Step 2: Data Sources in the wizard
Drag and drop files or click to browse
Select multiple files or upload one at a time
Click Upload to start processing

API Upload

Upload documents programmatically using the POST /rag/upload endpoint:

curl -X POST "https://api.guidedmind.ai/rag/upload" \
  -H "X-API-Key: rk_your_key_here" \
  -F "file=@/path/to/document.pdf"

Upload & Process

Upload and immediately process a document (chunking + embedding) in one request:

curl -X POST "https://api.guidedmind.ai/rag/upload-and-process" \
  -H "X-API-Key: rk_your_key_here" \
  -F "file=@/path/to/document.pdf" \
  -F 'config={"chunking":{"chunk_size":512,"chunk_overlap":50}}'

Use upload-and-process when you want the document searchable immediately. Use upload when you want to configure chunking settings first.

Python SDK

from guidedmind import Client
 
client = Client()
 
# Upload only
response = client.documents.upload(file_path="document.pdf")
print(f"Uploaded: {response.document_id}")
 
# Upload and process
response = client.documents.upload_and_process(file_path="document.pdf")
print(f"Processed: {response.chunks_created} chunks created")

Upload Response

{
  "document_id": "doc_abc123",
  "filename": "product-manual.pdf",
  "size_bytes": 2456789,
  "status": "uploaded",
  "uploaded_at": "2026-05-20T00:00:00Z"
}

Document Metadata

Attach custom metadata to documents for filtering and organization:

curl -X POST "https://api.guidedmind.ai/rag/upload" \
  -H "X-API-Key: rk_your_key_here" \
  -F "file=@/path/to/document.pdf" \
  -F 'metadata={"department":"engineering","version":"2.1","category":"api-docs"}'

Metadata is included in search results when include_metadata: true:

{
  "content": "The API supports REST and GraphQL endpoints...",
  "score": 0.89,
  "metadata": {
    "department": "engineering",
    "version": "2.1",
    "category": "api-docs",
    "source": "api-reference.pdf"
  }
}

File Preparation Tips

Use text-extractable PDFs (not scanned images)
Name files descriptively (e.g., product-manual-v2.pdf)
Remove password protection before uploading
Split very large documents into logical sections

Don't

Upload corrupted or password-protected files
Mix unrelated content in one document
Upload duplicates without checking first
Use generic names like document.pdf

Managing Documents

After upload, you can:

View document status and processing progress
Delete documents that are no longer needed
Re-process with different chunking settings
Check metadata attached to each document

Next Step

After uploading your documents, move to Document Processing to configure chunking and text processing options.

Next →Introduction