Science Infuse
About

Technologies

AI Models

Most AI models in Ada run locally

This ensuring data privacy, RGPD compliance, and reducing latency. This approach allows for:

  • Control over the models and their versions
  • Enhanced data security by keeping all processing on-premises
  • Adherence to RGPD (General Data Protection Regulation) requirements

Speech Recognition

  • Whisper: State-of-the-art model for transcribing videos into text with high accuracy.

Image Analysis

  • Florence-2: Advanced vision model employed to generate detailed descriptions of images extracted from PDFs.

Language Translation

PDF Structure Analysis

  • Surya: Sophisticated PDF analyzer that:
    • Identifies and categorizes text blocks, titles, and images
    • Determines the optimal reading order of content blocks
    • Enhances document understanding for further processing

Text Embeddings

  • Solon: Best open-source model for French semantic search.

Text Generation


Core Technological Components

Vector Database

  • pgvector: High-performance vector database extension for PostgreSQL chosen for its:
    • Fast search capabilities
    • Native support within the existing PostgreSQL infrastructure

Processing Backend

  • Python: Powerful and versatile programming language used for:

    • Serving AI models
    • Handling complex data processing pipelines
  • FastAPI: Modern, fast (high-performance) web framework for building APIs with Python, chosen for:

    • Asynchronous request handling
    • Easy integration with AI models
    • Typed with Pydantic for automatic data validation and serialization

Web Application Framework

  • Next.js: Modern React framework powering both the front-end and back-end of our web application (except for the processing part)

Document Storage

  • S3: Object storage solution used to store all original reference documents (PDF, image, video, etc.), offering:
    • Scalable and secure storage infrastructure
    • Easy integration with existing systems
    • Cost-effective storage for large volumes of data
    • Compliance with data protection regulations

Deployment

  • Docker and Docker Compose: Containerized deployment for easy scaling and management

On this page