You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have searched the existing feature request and this feature request is not already filed.
I believe this is a legitimate feature request, not just a question or bug.
Feature Request Description
Issue/Feature Description:
First off, I want to express my gratitude for lightRAG—it’s been a fantastic tool for my projects. I'd like to contribute a quickstart setup that makes it even easier for users to deploy lightRAG using Docker. This setup supports integration with OpenAI, Milvus (as Vector DB), Redis (for KV Storage), MongoDB (for Document Storage), and Neo4j (for Graph DB).
Contribution Details
I propose adding a docker-compose file and an example .env file to the repository. These files will serve as a quickstart guide for people to get lightRAG up and running with all the required storage services.
Docker Compose file:
services:
lightrag:
build: .ports:
- "${PORT:-9621}:9621"volumes:
- ./data/rag_storage:/app/data/rag_storage
- ./data/inputs:/app/data/inputs
- ./config.ini:/app/config.ini
- ./.env:/app/.envenv_file:
- .envrestart: unless-stoppeddepends_on:
- redis
- mongo
- milvus
- neo4jredis:
image: redis:7.4.2-alpine3.21container_name: lightrag-server_redisrestart: alwaysports:
- "6379:6379"# Exposes container's port 6379 on host's port 6379volumes:
- lightrag_redis_data:/dataneo4j:
image: neo4j:5.26.4-communitycontainer_name: lightrag-server_neo4j-communityrestart: alwaysports:
- "7474:7474"
- "7687:7687"environment:
- NEO4J_AUTH=${NEO4J_USERNAME}/${NEO4J_PASSWORD}
- NEO4J_apoc_export_file_enabled=true
- NEO4J_server_bolt_listen__address=0.0.0.0:7687
- NEO4J_server_bolt_advertised__address=neo4j:7687volumes:
- ./neo4j/plugins:/var/lib/neo4j/plugins # This is something I did because for neo4j you need to download the APOC file.
- lightrag_neo4j_import:/var/lib/neo4j/import
- lightrag_neo4j_data:/data
- lightrag_neo4j_backups:/backupsetcd: # etcd, minio and milvus are just copy pasted from their own docker compose file.container_name: lightrag-server_milvus-etcdimage: quay.io/coreos/etcd:v3.5.5environment:
- ETCD_AUTO_COMPACTION_MODE=revision
- ETCD_AUTO_COMPACTION_RETENTION=1000
- ETCD_QUOTA_BACKEND_BYTES=4294967296
- ETCD_SNAPSHOT_COUNT=50000volumes:
- lightrag_etcd_data:/etcdcommand: etcd -advertise-client-urls=http://127.0.0.1:2379 -listen-client-urls http://0.0.0.0:2379 --data-dir /etcdhealthcheck:
test: ["CMD", "etcdctl", "endpoint", "health"]interval: 30stimeout: 20sretries: 3minio:
container_name: lightrag-server_milvus-minioimage: minio/minio:RELEASE.2023-03-20T20-16-18Zenvironment:
MINIO_ACCESS_KEY: minioadminMINIO_SECRET_KEY: minioadminports:
- "9001:9001"
- "9000:9000"volumes:
- lightrag_minio_data:/minio_datacommand: minio server /minio_data --console-address ":9001"healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:9000/minio/health/live"]interval: 30stimeout: 20sretries: 3milvus:
container_name: lightrag-server_milvus-standaloneimage: milvusdb/milvus:v2.4.15command: ["milvus", "run", "standalone"]security_opt:
- seccomp:unconfinedenvironment:
ETCD_ENDPOINTS: etcd:2379MINIO_ADDRESS: minio:9000volumes:
- lightrag_milvus_data:/var/lib/milvushealthcheck:
test: ["CMD", "curl", "-f", "http://localhost:9091/healthz"]interval: 30sstart_period: 90stimeout: 20sretries: 3ports:
- "19530:19530"
- "9091:9091"depends_on:
- etcd
- miniomongo:
image: mongodb/mongodb-community-server:7.0.8-ubi8container_name: lightrag-server_mongo-communityrestart: alwaysports:
- "27017:27017"environment:
- MONGODB_INITDB_ROOT_USERNAME=${MONGO_USERNAME}
- MONGODB_INITDB_ROOT_PASSWORD=${MONGO_PASSWORD}volumes:
- lightrag_mongo_data:/data/dbvolumes: # I have tried to not have any guid named volumes anymore but this isn't complete yet.lightrag_redis_data:
lightrag_neo4j_import:
lightrag_neo4j_data:
lightrag_neo4j_backups:
lightrag_etcd_data:
lightrag_minio_data:
lightrag_milvus_data:
lightrag_mongo_data:
Environment variables file:
### This is sample file of .env### Server ConfigurationHOST=0.0.0.0PORT=9621WORKERS=4NAMESPACE_PREFIX=lightrag# separating data from difference Lightrag instancesMAX_GRAPH_NODES=1000# Max nodes return from grap retrievalCORS_ORIGINS=http://localhost:3000,http://localhost:8080# It runs on port 9621 as you can see in your docker desktop### Optional SSL Configuration# SSL=true# SSL_CERTFILE=/path/to/cert.pem# SSL_KEYFILE=/path/to/key.pem### Directory Configuration# INPUT_DIR=<absolute_path_for_doc_input_dir>### Ollama Emulating Model Tag# OLLAMA_EMULATING_MODEL_TAG=latest### Logging level# LOG_LEVEL=INFO# VERBOSE=False# LOG_DIR=/path/to/log/directory # Log file directory path, defaults to current working directory# LOG_MAX_BYTES=10485760 # Log file max size in bytes, defaults to 10MB# LOG_BACKUP_COUNT=5 # Number of backup files to keep, defaults to 5### Settings for RAG query# HISTORY_TURNS=3# COSINE_THRESHOLD=0.2# TOP_K=60# MAX_TOKEN_TEXT_CHUNK=4000# MAX_TOKEN_RELATION_DESC=4000# MAX_TOKEN_ENTITY_DESC=4000### Settings for document indexingENABLE_LLM_CACHE_FOR_EXTRACT=true# Enable LLM cache for entity extractionSUMMARY_LANGUAGE=English# Should probably do everything in english if possible. If you need a different language just translate it first to english.# You can always tweak these variables if you want.CHUNK_SIZE=1200CHUNK_OVERLAP_SIZE=100# MAX_TOKEN_SUMMARY=500 # Max tokens for entity or relations summary# MAX_PARALLEL_INSERT=2 # Number of parallel processing documents in one patch# EMBEDDING_BATCH_NUM=32 # num of chunks send to Embedding in one request# EMBEDDING_FUNC_MAX_ASYNC=16 # Max concurrency requests for Embedding# MAX_EMBED_TOKENS=8192### LLM Configuration (Use valid host. For local services installed with docker, you can use host.docker.internal)TIMEOUT=150# Time out in seconds for LLM, None for infinite timeoutTEMPERATURE=0.5MAX_ASYNC=4# Max concurrency requests of LLMMAX_TOKENS=32768# Max tokens send to LLM (less than context size of the model)### OpenAI alike exampleLLM_BINDING=openaiLLM_MODEL=gpt-4o-mini# I would use mini for cost limitations.LLM_BINDING_HOST=https://api.openai.com/v1# If you do not include OPENAI_API_KEY you will get an error. So i just input my api key at both variables.LLM_BINDING_API_KEY=sk-# API key openai. Same key for bothOPENAI_API_KEY=sk-# API key openai. Same key for both### Embedding Configuration (Use valid host. For local services installed with docker, you can use host.docker.internal)EMBEDDING_MODEL=text-embedding-3-small# Low cost, good performanceEMBEDDING_DIM=1536# The number of dimensions the embedding model returnsEMBEDDING_BINDING_API_KEY=your_api_key# I have never set this, but I have also configured my LightRAG a bit differnet in the lightrag_server.py file. Ill also post this somewhere.EMBEDDING_BINDING=openaiLLM_BINDING_HOST=https://api.openai.com/v1### Data storage selectionLIGHTRAG_KV_STORAGE=RedisKVStorageLIGHTRAG_VECTOR_STORAGE=MilvusVectorDBStorageLIGHTRAG_GRAPH_STORAGE=Neo4JStorageLIGHTRAG_DOC_STATUS_STORAGE=MongoDocStatusStorage### Neo4j ConfigurationNEO4J_URI=bolt://neo4j:7687# Should match name of serviceNEO4J_USERNAME=neo4j# This can only be neo4jNEO4J_PASSWORD=# Can be anything you want, but you would want to make it a bit difficult. This is also what you use when checking your graph at 7474 port of neo4j.NEO4J_DATABASE=neo4j# This avoids the "Database name cannot be set" error, because the community edition only supports one graph db.LOGFIRE_API_KEY=# Logfire is an easy way to log calls. Especially to OpenAI. But also any other LLM or providor and its free.# ### MongoDB ConfigurationMONGO_URI=mongodb://admin:admin123@mongo:27017/MONGO_USERNAME=admin# Should match URI credentialsMONGO_PASSWORD=admin123# Should match URI credentialsMONGO_DATABASE=LightRAG# Just a standard name# MONGODB_GRAPH=false # deprecated (keep for backward compatibility)### Milvus ConfigurationMILVUS_URI=http://milvus:19530MILVUS_DB_NAME=lightrag# Should match name you give your db at lightrag_server.py. I'll explain what i mean by this in the github post.MILVUS_USER=# Can be anything you wantMILVUS_PASSWORD=# Can be anything you want# # MILVUS_TOKEN=your_token### RedisREDIS_URI=redis://redis:6379### For JWTt Auth (LIGHTRAG VERSION 1.3.0)AUTH_USERNAME=admin# login nameAUTH_PASSWORD=admin123# passwordTOKEN_SECRET=adminadmin-LightRAG-API-Server# JWT keyTOKEN_EXPIRE_HOURS=4# expire duration### For JWT Auth (LIGHTRAG VERSION 1.3.1)AUTH_ACCOUNTS='admin:admin123,user1:pass456'TOKEN_SECRET=Your-Key-For-LightRAG-API-ServerTOKEN_EXPIRE_HOURS=48GUEST_TOKEN_EXPIRE_HOURS=24JWT_ALGORITHM=HS256### API-Key to access LightRAG Server API# LIGHTRAG_API_KEY=your-secure-api-key-here# WHITELIST_PATHS=/health,/api/*
Additional Code Changes
I have made a few small adjustments in the codebase that I believe will improve the setup experience:
Milvus Database Creation in lightrag_server.py:
To avoid errors where Milvus complains that the "lightrag" database doesn’t exist, I added the following snippet at the top of the imports in the file:
frompymilvusimportconnections, db# Connect to Milvusconnections.connect(host="milvus", port=19530)
# Check if database 'lightrag' exists by listing available databasesif"lightrag"notindb.list_database():
db.create_database("lightrag")
Note: Please ensure that the database name ("lightrag") matches what is specified in the .env file.
RAG Initialization Update in lightrag_server.py:
I adjusted the initialization of the RAG component so that the llm_model_func is explicitly set and the embedding_func parameter is provided. The update looks like this:
This change addresses occasional errors when openai_embed isn’t explicitly passed despite being mentioned in the settings.
Neo4j Connection Pool Size in neo4j_impl.py:
To better accommodate higher workloads (which do exist for Neo4j specifically), I increased the default maximum connection pool size for Neo4j to 400 (which is the maximum for the community edition by default). This change is intended as a temporary measure until the batch feature is implemented, after which it can be reduced to 100.
# Set this a bit higher than expected workloadsMAX_CONNECTION_POOL_SIZE=int(
os.environ.get(
"NEO4J_MAX_CONNECTION_POOL_SIZE",
config.get("neo4j", "connection_pool_size", fallback=400),
)
)
Please note that sometimes the Neo4j database container takes a little time to start up, so the lightrag container might log an "unable to connect" message; this is expected, and waiting a few moments should resolve the issue.
Docker Compose Commands
For convenience, here are some instructions to start and stop the containers:
To start everything:
Open Windows PowerShell, navigate to the repository directory where the docker-compose file is located, and run:
docker compose -f <nameoffile.yml> up --build -d
After everything is started check your docker desktop and click on port 9621 for the WebUI. The .env settings contains username and password.
To stop the containers:
docker compose -f <nameoffile.yml> down
To stop the containers and remove the volumes:
docker compose -f <nameoffile.yml> down -v
I hope these changes help new users get started with lightRAG more quickly and smoothly. Thank you for all the work on this project, and good luck to everyone!
Additional Context
No response
The text was updated successfully, but these errors were encountered:
I'm also fairly new to Milvus, so I had a chat with o3-mini and this was what the conclusion is in short:
MinIO’s Role:
Acts as the persistent, scalable, S3-compatible object storage solution for Milvus deployments.
Stores raw files, snapshots, and backups, allowing Milvus to concentrate on efficient vector processing.
Is MinIO Obligatory?
MinIO is included in the default Milvus deployment to simplify production setups, but it is not strictly mandatory. Alternative storage options can be used as long as they meet the requirements for durability and scalability.
So basically MinIO is great to have and I now understand why Milvus includes it in their own official docker-compose.yml.
I asked a bit more and having MinIO and connecting the Metadata/Images to the nodes in a smart way you might even be able to add Image Search or Image embedded search to LightRAG in the future.
Do you need to file a feature request?
Feature Request Description
Issue/Feature Description:
First off, I want to express my gratitude for lightRAG—it’s been a fantastic tool for my projects. I'd like to contribute a quickstart setup that makes it even easier for users to deploy lightRAG using Docker. This setup supports integration with OpenAI, Milvus (as Vector DB), Redis (for KV Storage), MongoDB (for Document Storage), and Neo4j (for Graph DB).
Contribution Details
I propose adding a docker-compose file and an example .env file to the repository. These files will serve as a quickstart guide for people to get lightRAG up and running with all the required storage services.
Docker Compose file:
Environment variables file:
Additional Code Changes
I have made a few small adjustments in the codebase that I believe will improve the setup experience:
Milvus Database Creation in
lightrag_server.py
:To avoid errors where Milvus complains that the "lightrag" database doesn’t exist, I added the following snippet at the top of the imports in the file:
Note: Please ensure that the database name ("lightrag") matches what is specified in the .env file.
RAG Initialization Update in
lightrag_server.py
:I adjusted the initialization of the RAG component so that the
llm_model_func
is explicitly set and theembedding_func
parameter is provided. The update looks like this:I import
gpt_4o_mini_complete
andopenai_embed
from:This change addresses occasional errors when
openai_embed
isn’t explicitly passed despite being mentioned in the settings.Neo4j Connection Pool Size in
neo4j_impl.py
:To better accommodate higher workloads (which do exist for Neo4j specifically), I increased the default maximum connection pool size for Neo4j to 400 (which is the maximum for the community edition by default). This change is intended as a temporary measure until the batch feature is implemented, after which it can be reduced to 100.
Please note that sometimes the Neo4j database container takes a little time to start up, so the lightrag container might log an "unable to connect" message; this is expected, and waiting a few moments should resolve the issue.
Docker Compose Commands
For convenience, here are some instructions to start and stop the containers:
Open Windows PowerShell, navigate to the repository directory where the docker-compose file is located, and run:
After everything is started check your docker desktop and click on port 9621 for the WebUI. The .env settings contains username and password.
To stop the containers:
To stop the containers and remove the volumes:
I hope these changes help new users get started with lightRAG more quickly and smoothly. Thank you for all the work on this project, and good luck to everyone!
Additional Context
No response
The text was updated successfully, but these errors were encountered: