ArchitectureData flow

Data indexing flow

Document Upload

User uploads document (PDF, image, YouTube URL, MP4, etc.)
Webapp validates file type
If applicable, webapp server uploads file to S3 storage

Document Processing

Webapp backend sends document to Python processing service
Processing service extracts text/content based on file type
Content is split into chunks
Processing service returns Document and DocumentChunk objects

Database Insertion

Webapp backend receives processed document and chunks
Vector embeddings is generated for chunks's texts
Document and chunk data inserted into database tables

Indexing Completion

Database confirms successful insertion
User notified of completed indexing
Document now available for vector similarity search

Previous

User search flow

Next

Set-up the environment variables

On this page

Document Upload Document Processing Database Insertion Indexing Completion