Skip to content

anasserhussien/movie_llm_recsys

Repository files navigation

LLM Based Movie RecSys

Configuration

Create a .env file in the root directory of the project to store your MongoDB and Milvus/Zilliz connection string:

MONGO_URI
MILVUS_URI
MILVUS_TOKEN

Exporting the Movies Dataset

You can skip this step and use the movies.json which has the extracted movies.

To export the movies dataset from the sample_mlfix collection in your MongoDB database and save it as a movies.json file, you can execute the db_export.py script. This script connects to your MongoDB instance, retrieves the dataset, and writes it to a JSON file for easy sharing and analysis.

Obtain Embeddings

To generate embeddings for each movie based on its description, you can execute the get_embeddings.py script. This process uses the all-MiniLM-L6-v2 model from the Sentence Transformers library, which produces embeddings of size 384 for each movie description. The embeddings are added to each movie, and the dataset is saved as movies_with_embeddings.json.

Populate Database

The populate_milvus_db.py script is designed to:

  • Connect to your Milvus/Zilliz database.
  • Create a collection movies_collection to store movie data and their corresponding embeddings.
  • Index the collection for efficient vector similarity search.

This script sets up the database and prepares it for recommendation queries, enabling quick and efficient retrieval of similar movie desc.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages