Data
Data Structure
Note
You can find the data structure in the Prisma schema located at webapp/src/prisma/schema.prisma
.
We use PostgreSQL with pgvector extension to store the data.
Types of Documents
Ada can injest multiple types of documents:
- Pdfs
- Images
- Youtube Videos
- Websites
- ...
Document Structure
Documents are processed and stored in a three-level hierarchy:
-
Document Level
- The top-level entity representing the document
- Contains basic file information: i.e. name, original path, s3 access...
-
Chunks Level
- Documents are broken down into smaller text chunks for easier indexing
- Each chunk's text is transformed into vector embeddings
- These embeddings are used for semantic search
-
Metadata Level
- Each chunk can have associated metadata
- Stores contextual information about the chunk: i.e. page number, bounding box
Semantic Search Implementation
The semantic search capability is enabled by:
- Breaking documents into manageable chunks
- Converting chunk text into vector embeddings
- Storing these embeddings in pgvector
- Using vector similarity for search operations