ArchitectureData flow
Data indexing flow
Document Upload
- User uploads document (PDF, image, YouTube URL, MP4, etc.)
- Webapp validates file type
- If applicable, webapp server uploads file to S3 storage
Document Processing
- Webapp backend sends document to Python processing service
- Processing service extracts text/content based on file type
- Content is split into chunks
- Processing service returns Document and DocumentChunk objects
Database Insertion
- Webapp backend receives processed document and chunks
- Vector embeddings is generated for chunks's texts
- Document and chunk data inserted into database tables
Indexing Completion
- Database confirms successful insertion
- User notified of completed indexing
- Document now available for vector similarity search