Create a .env
file in the root directory of the project to store your MongoDB and Milvus/Zilliz connection string:
MONGO_URI
MILVUS_URI
MILVUS_TOKEN
You can skip this step and use the movies.json
which has the extracted movies.
To export the movies
dataset from the sample_mlfix
collection in your MongoDB database and save it as a movies.json
file, you can execute the db_export.py
script. This script connects to your MongoDB instance, retrieves the dataset, and writes it to a JSON file for easy sharing and analysis.
To generate embeddings for each movie based on its description, you can execute the get_embeddings.py
script. This process uses the all-MiniLM-L6-v2
model from the Sentence Transformers library, which produces embeddings of size 384 for each movie description. The embeddings are added to each movie, and the dataset is saved as movies_with_embeddings.json
.
The populate_milvus_db.py
script is designed to:
- Connect to your Milvus/Zilliz database.
- Create a collection
movies_collection
to store movie data and their corresponding embeddings. - Index the collection for efficient vector similarity search.
This script sets up the database and prepares it for recommendation queries, enabling quick and efficient retrieval of similar movie desc.