Skip to content

Prodigy is an AI-powered knowledge management platform utilizing Large Language Models (LLMs) and the Retrieval-Augmented Generation (RAG) technique for Q&A systems.

Notifications You must be signed in to change notification settings

vignesh865/prodigy

Repository files navigation

Prodigy

Prodigy employs OpenAI for production and quantized Mistral for local development, integrating connected pipelines with Google Drive and Kafka for data ingestion. Leveraging retrieval augmented generation techniques using Large Language Models (LLMs), Prodigy enhances knowledge management with advanced AI capabilities.

Demo

In this demo, we are querying 270 pages of unstructured documents loaded directly from Google Drive and ingested into the QdrantDB(Using data pipeline). Then, the context for the queries will be retrieved using a Hybrid search Method(BM25 embeddings for text search and hugging face embeddings for semantic search). After that, the retrieved context will be fed to the LLM and the answers generated will be streamed to the frontend.

The answers for the given query and relevant context from the pdf are shown side by side in the below video(Clicking will redirect to demo video hosted in Youtube).

V1 Video With Single Document System:

IMAGE ALT TEXT HERE

V3 Video with Authentication and Multi-Document System:

Rag Demo V3 Thumbnail


Setup Instructions

This is a fullstack application. The steps below will guide you through the initial setup and running the application. Kafka and Redis are part of the application stack but are not necessary for the initial setup. You can comment out source_consumer/apps.py and follow the instructions below. Once the initial setup is complete, you can proceed to set up Kafka and Redis.

These instructions are optimized for PyCharm.

Prerequisites

  1. PyCharm: Ensure you have PyCharm installed.
  2. Python: Make sure Python is installed on your machine.
  3. Google Cloud Account: Required for Google account integration.
  4. Qdrant Account: Required for Qdrant client integration.

Initial Setup

  1. Import the Project:

    • Open PyCharm and import the project.
    • Create a new Python interpreter. PyCharm will automatically create a venv folder.
  2. Restart PyCharm:

    • Restart the IDE.
    • The terminal should be prefixed with (venv). For example:
      (venv) vignesh@Vigneshs-MacBook-Pro prodigy %
      
  3. Install Requirements:

    • Open the terminal and run:
      pip install -r requirements.txt
  4. Database Migrations:

    • Run the following commands to set up the database:
      python manage.py makemigrations
      python manage.py migrate
  5. Seed Initial Data:

    • Open the Django shell:
      python manage.py shell
    • In the shell, run the commands from init_run.shell one by one.
    • Exit the shell:
      exit()
  6. Start Backend Server:

    • Run the backend server:
      python manage.py runserver 8080
  7. Start Frontend Server:

    • Run the frontend server:
      streamlit run frontend/About.py
    • It is preferred to run this in a PyCharm configuration because it sets the Python path automatically.
    • Screenshot 2024-08-06 at 9 55 25 PM

Without the below setup, now you should be able to start, login and logout from the app.

Additional Setup

  1. Google Account Integration:

    • Create a Google Cloud project. Create a oauth2 secret in google console and update it in the resources/secrets folder. The file name would be client_secret.json
    • Detailed documentation for this setup will be updated soon.
  2. Qdrant Client Integration:

    • Create a free Qdrant server.
    • Detailed documentation for this setup will be updated soon.

This README provides the necessary steps to get your Prodigy app running. If you have any questions or run into issues, feel free to ask for help.

About

Prodigy is an AI-powered knowledge management platform utilizing Large Language Models (LLMs) and the Retrieval-Augmented Generation (RAG) technique for Q&A systems.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published