Skip to content

[Feature Request]: Add Quickstart Docker Compose & .env Setup for lightRAG Deployment for OpenAI, Milvus, Redis, MongoDB & Neo4j #1356

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
2 tasks done
frederikhendrix opened this issue Apr 11, 2025 · 2 comments
Labels
docker enhancement New feature or request Server LightRAG Server
Milestone

Comments

@frederikhendrix
Copy link
Contributor

frederikhendrix commented Apr 11, 2025

Do you need to file a feature request?

  • I have searched the existing feature request and this feature request is not already filed.
  • I believe this is a legitimate feature request, not just a question or bug.

Feature Request Description

Issue/Feature Description:

First off, I want to express my gratitude for lightRAG—it’s been a fantastic tool for my projects. I'd like to contribute a quickstart setup that makes it even easier for users to deploy lightRAG using Docker. This setup supports integration with OpenAI, Milvus (as Vector DB), Redis (for KV Storage), MongoDB (for Document Storage), and Neo4j (for Graph DB).

Contribution Details

I propose adding a docker-compose file and an example .env file to the repository. These files will serve as a quickstart guide for people to get lightRAG up and running with all the required storage services.

Docker Compose file:

services:
  lightrag:
    build: .
    ports:
      - "${PORT:-9621}:9621"
    volumes:
      - ./data/rag_storage:/app/data/rag_storage
      - ./data/inputs:/app/data/inputs
      - ./config.ini:/app/config.ini
      - ./.env:/app/.env
    env_file:
      - .env
    restart: unless-stopped
    depends_on:
      - redis
      - mongo
      - milvus
      - neo4j

  redis:
    image: redis:7.4.2-alpine3.21
    container_name: lightrag-server_redis
    restart: always
    ports:
      - "6379:6379"  # Exposes container's port 6379 on host's port 6379
    volumes:
      - lightrag_redis_data:/data

  neo4j:
    image: neo4j:5.26.4-community
    container_name: lightrag-server_neo4j-community
    restart: always
    ports:
      - "7474:7474"
      - "7687:7687"
    environment:
      - NEO4J_AUTH=${NEO4J_USERNAME}/${NEO4J_PASSWORD}
      - NEO4J_apoc_export_file_enabled=true
      - NEO4J_server_bolt_listen__address=0.0.0.0:7687
      - NEO4J_server_bolt_advertised__address=neo4j:7687
    volumes:
      - ./neo4j/plugins:/var/lib/neo4j/plugins  # This is something I did because for neo4j you need to download the APOC file.
      - lightrag_neo4j_import:/var/lib/neo4j/import
      - lightrag_neo4j_data:/data
      - lightrag_neo4j_backups:/backups

  etcd: # etcd, minio and milvus are just copy pasted from their own docker compose file.
    container_name: lightrag-server_milvus-etcd
    image: quay.io/coreos/etcd:v3.5.5
    environment:
      - ETCD_AUTO_COMPACTION_MODE=revision
      - ETCD_AUTO_COMPACTION_RETENTION=1000
      - ETCD_QUOTA_BACKEND_BYTES=4294967296
      - ETCD_SNAPSHOT_COUNT=50000
    volumes:
      - lightrag_etcd_data:/etcd
    command: etcd -advertise-client-urls=http://127.0.0.1:2379 -listen-client-urls http://0.0.0.0:2379 --data-dir /etcd
    healthcheck:
      test: ["CMD", "etcdctl", "endpoint", "health"]
      interval: 30s
      timeout: 20s
      retries: 3

  minio:
    container_name: lightrag-server_milvus-minio
    image: minio/minio:RELEASE.2023-03-20T20-16-18Z
    environment:
      MINIO_ACCESS_KEY: minioadmin
      MINIO_SECRET_KEY: minioadmin
    ports:
      - "9001:9001"
      - "9000:9000"
    volumes:
      - lightrag_minio_data:/minio_data
    command: minio server /minio_data --console-address ":9001"
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:9000/minio/health/live"]
      interval: 30s
      timeout: 20s
      retries: 3

  milvus:
    container_name: lightrag-server_milvus-standalone
    image: milvusdb/milvus:v2.4.15
    command: ["milvus", "run", "standalone"]
    security_opt:
      - seccomp:unconfined
    environment:
      ETCD_ENDPOINTS: etcd:2379
      MINIO_ADDRESS: minio:9000
    volumes:
      - lightrag_milvus_data:/var/lib/milvus
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:9091/healthz"]
      interval: 30s
      start_period: 90s
      timeout: 20s
      retries: 3
    ports:
      - "19530:19530"
      - "9091:9091"
    depends_on:
      - etcd
      - minio

  mongo:
    image: mongodb/mongodb-community-server:7.0.8-ubi8
    container_name: lightrag-server_mongo-community
    restart: always
    ports:
      - "27017:27017"
    environment:
      - MONGODB_INITDB_ROOT_USERNAME=${MONGO_USERNAME}
      - MONGODB_INITDB_ROOT_PASSWORD=${MONGO_PASSWORD}
    volumes:
      - lightrag_mongo_data:/data/db

volumes: # I have tried to not have any guid named volumes anymore but this isn't complete yet.
  lightrag_redis_data:
  lightrag_neo4j_import:
  lightrag_neo4j_data:
  lightrag_neo4j_backups:
  lightrag_etcd_data:
  lightrag_minio_data:
  lightrag_milvus_data:
  lightrag_mongo_data:

Environment variables file:

### This is sample file of .env

### Server Configuration
HOST=0.0.0.0
PORT=9621
WORKERS=4
NAMESPACE_PREFIX=lightrag  # separating data from difference Lightrag instances
MAX_GRAPH_NODES=1000       # Max nodes return from grap retrieval
CORS_ORIGINS=http://localhost:3000,http://localhost:8080 # It runs on port 9621 as you can see in your docker desktop

### Optional SSL Configuration
# SSL=true
# SSL_CERTFILE=/path/to/cert.pem
# SSL_KEYFILE=/path/to/key.pem

### Directory Configuration
# INPUT_DIR=<absolute_path_for_doc_input_dir>

### Ollama Emulating Model Tag
# OLLAMA_EMULATING_MODEL_TAG=latest

### Logging level
# LOG_LEVEL=INFO
# VERBOSE=False
# LOG_DIR=/path/to/log/directory  # Log file directory path, defaults to current working directory
# LOG_MAX_BYTES=10485760          # Log file max size in bytes, defaults to 10MB
# LOG_BACKUP_COUNT=5              # Number of backup files to keep, defaults to 5

### Settings for RAG query
# HISTORY_TURNS=3
# COSINE_THRESHOLD=0.2
# TOP_K=60
# MAX_TOKEN_TEXT_CHUNK=4000
# MAX_TOKEN_RELATION_DESC=4000
# MAX_TOKEN_ENTITY_DESC=4000

### Settings for document indexing
ENABLE_LLM_CACHE_FOR_EXTRACT=true    # Enable LLM cache for entity extraction
SUMMARY_LANGUAGE=English             # Should probably do everything in english if possible. If you need a different language just translate it first to english.

# You can always tweak these variables if you want.
CHUNK_SIZE=1200
CHUNK_OVERLAP_SIZE=100
# MAX_TOKEN_SUMMARY=500                # Max tokens for entity or relations summary
# MAX_PARALLEL_INSERT=2                # Number of parallel processing documents in one patch

# EMBEDDING_BATCH_NUM=32               # num of chunks send to Embedding in one request
# EMBEDDING_FUNC_MAX_ASYNC=16          # Max concurrency requests for Embedding
# MAX_EMBED_TOKENS=8192

### LLM Configuration (Use valid host. For local services installed with docker, you can use host.docker.internal)
TIMEOUT=150                            # Time out in seconds for LLM, None for infinite timeout
TEMPERATURE=0.5
MAX_ASYNC=4                            # Max concurrency requests of LLM
MAX_TOKENS=32768                       # Max tokens send to LLM (less than context size of the model)

### OpenAI alike example
LLM_BINDING=openai
LLM_MODEL=gpt-4o-mini                   # I would use mini for cost limitations.
LLM_BINDING_HOST=https://api.openai.com/v1

# If you do not include OPENAI_API_KEY you will get an error. So i just input my api key at both variables.
LLM_BINDING_API_KEY=sk-                 # API key openai. Same key for both
OPENAI_API_KEY=sk-                      # API key openai. Same key for both

### Embedding Configuration (Use valid host. For local services installed with docker, you can use host.docker.internal)
EMBEDDING_MODEL=text-embedding-3-small      # Low cost, good performance
EMBEDDING_DIM=1536                          # The number of dimensions the embedding model returns
EMBEDDING_BINDING_API_KEY=your_api_key      # I have never set this, but I have also configured my LightRAG a bit differnet in the lightrag_server.py file. Ill also post this somewhere.

EMBEDDING_BINDING=openai
LLM_BINDING_HOST=https://api.openai.com/v1

### Data storage selection
LIGHTRAG_KV_STORAGE=RedisKVStorage
LIGHTRAG_VECTOR_STORAGE=MilvusVectorDBStorage
LIGHTRAG_GRAPH_STORAGE=Neo4JStorage
LIGHTRAG_DOC_STATUS_STORAGE=MongoDocStatusStorage

### Neo4j Configuration
NEO4J_URI=bolt://neo4j:7687         # Should match name of service
NEO4J_USERNAME=neo4j                # This can only be neo4j
NEO4J_PASSWORD=                     # Can be anything you want, but you would want to make it a bit difficult. This is also what you use when checking your graph at 7474 port of neo4j.
NEO4J_DATABASE=neo4j                # This avoids the "Database name cannot be set" error, because the community edition only supports one graph db.
LOGFIRE_API_KEY=                    # Logfire is an easy way to log calls. Especially to OpenAI. But also any other LLM or providor and its free.

# ### MongoDB Configuration
MONGO_URI=mongodb://admin:admin123@mongo:27017/
MONGO_USERNAME=admin                # Should match URI credentials
MONGO_PASSWORD=admin123             # Should match URI credentials
MONGO_DATABASE=LightRAG             # Just a standard name
# MONGODB_GRAPH=false # deprecated (keep for backward compatibility)

### Milvus Configuration
MILVUS_URI=http://milvus:19530
MILVUS_DB_NAME=lightrag             # Should match name you give your db at lightrag_server.py. I'll explain what i mean by this in the github post.
MILVUS_USER=                        # Can be anything you want
MILVUS_PASSWORD=                    # Can be anything you want
# # MILVUS_TOKEN=your_token

### Redis
REDIS_URI=redis://redis:6379

### For JWTt Auth (LIGHTRAG VERSION 1.3.0)
AUTH_USERNAME=admin             # login name
AUTH_PASSWORD=admin123          # password
TOKEN_SECRET=adminadmin-LightRAG-API-Server           # JWT key
TOKEN_EXPIRE_HOURS=4            # expire duration

### For JWT Auth (LIGHTRAG VERSION 1.3.1)
AUTH_ACCOUNTS='admin:admin123,user1:pass456'
TOKEN_SECRET=Your-Key-For-LightRAG-API-Server
TOKEN_EXPIRE_HOURS=48
GUEST_TOKEN_EXPIRE_HOURS=24
JWT_ALGORITHM=HS256

### API-Key to access LightRAG Server API
# LIGHTRAG_API_KEY=your-secure-api-key-here
# WHITELIST_PATHS=/health,/api/*

Additional Code Changes

I have made a few small adjustments in the codebase that I believe will improve the setup experience:

  1. Milvus Database Creation in lightrag_server.py:

    To avoid errors where Milvus complains that the "lightrag" database doesn’t exist, I added the following snippet at the top of the imports in the file:

    from pymilvus import connections, db
    
    # Connect to Milvus
    connections.connect(host="milvus", port=19530)
    
    # Check if database 'lightrag' exists by listing available databases
    if "lightrag" not in db.list_database():
        db.create_database("lightrag")

    Note: Please ensure that the database name ("lightrag") matches what is specified in the .env file.

  2. RAG Initialization Update in lightrag_server.py:

    I adjusted the initialization of the RAG component so that the llm_model_func is explicitly set and the embedding_func parameter is provided. The update looks like this:

    # Initialize RAG
    if args.llm_binding in ["openai"]:
        rag = LightRAG(
            working_dir=args.working_dir,
            llm_model_func=gpt_4o_mini_complete,
            llm_model_name=args.llm_model,
            llm_model_max_async=args.max_async,
            llm_model_max_token_size=args.max_tokens,
            chunk_token_size=int(args.chunk_size),
            chunk_overlap_token_size=int(args.chunk_overlap_size),
            llm_model_kwargs={
                "host": args.llm_binding_host,
                "timeout": args.timeout,
                "options": {"num_ctx": args.max_tokens},
                "api_key": args.llm_binding_api_key,
            }
            if args.llm_binding == "lollms" or args.llm_binding == "ollama"
            else {},
            embedding_func=openai_embed,
            kv_storage=args.kv_storage,
            graph_storage=args.graph_storage,
            vector_storage=args.vector_storage,
            doc_status_storage=args.doc_status_storage,
            vector_db_storage_cls_kwargs={
                "cosine_better_than_threshold": args.cosine_threshold
            },
            enable_llm_cache_for_entity_extract=args.enable_llm_cache_for_extract,
            embedding_cache_config={
                "enabled": True,
                "similarity_threshold": 0.95,
                "use_llm_check": False,
            },
            namespace_prefix=args.namespace_prefix,
            auto_manage_storages_states=False,
            max_parallel_insert=args.max_parallel_insert,
        )
    
     
     # Add routes
     app.include_router(create_document_routes(rag, doc_manager, api_key))

    I import gpt_4o_mini_complete and openai_embed from:

    from lightrag.llm.openai import openai_complete_if_cache, openai_embed

    This change addresses occasional errors when openai_embed isn’t explicitly passed despite being mentioned in the settings.

  3. Neo4j Connection Pool Size in neo4j_impl.py:

    To better accommodate higher workloads (which do exist for Neo4j specifically), I increased the default maximum connection pool size for Neo4j to 400 (which is the maximum for the community edition by default). This change is intended as a temporary measure until the batch feature is implemented, after which it can be reduced to 100.

    # Set this a bit higher than expected workloads
    MAX_CONNECTION_POOL_SIZE = int(
        os.environ.get(
            "NEO4J_MAX_CONNECTION_POOL_SIZE",
            config.get("neo4j", "connection_pool_size", fallback=400),
        )
    )

    Please note that sometimes the Neo4j database container takes a little time to start up, so the lightrag container might log an "unable to connect" message; this is expected, and waiting a few moments should resolve the issue.


Docker Compose Commands

For convenience, here are some instructions to start and stop the containers:

  • To start everything:
    Open Windows PowerShell, navigate to the repository directory where the docker-compose file is located, and run:
    docker compose -f <nameoffile.yml> up --build -d
    

After everything is started check your docker desktop and click on port 9621 for the WebUI. The .env settings contains username and password.

  • To stop the containers:

    docker compose -f <nameoffile.yml> down
    
  • To stop the containers and remove the volumes:

    docker compose -f <nameoffile.yml> down -v
    

I hope these changes help new users get started with lightRAG more quickly and smoothly. Thank you for all the work on this project, and good luck to everyone!

Additional Context

No response

@danielaskdd
Copy link
Collaborator

Thank you for sharing. Could you elaborate on the purpose of MinIO and its advantages when integrated with Milvus?

@frederikhendrix
Copy link
Contributor Author

frederikhendrix commented Apr 12, 2025

I'm also fairly new to Milvus, so I had a chat with o3-mini and this was what the conclusion is in short:

  • MinIO’s Role:

    • Acts as the persistent, scalable, S3-compatible object storage solution for Milvus deployments.
    • Stores raw files, snapshots, and backups, allowing Milvus to concentrate on efficient vector processing.
  • Is MinIO Obligatory?

    • MinIO is included in the default Milvus deployment to simplify production setups, but it is not strictly mandatory. Alternative storage options can be used as long as they meet the requirements for durability and scalability.

So basically MinIO is great to have and I now understand why Milvus includes it in their own official docker-compose.yml.

Also here is the github link to the docker-compose.yml for Milvus standalone:
Milvus Standalone Github Yaml File

I asked a bit more and having MinIO and connecting the Metadata/Images to the nodes in a smart way you might even be able to add Image Search or Image embedded search to LightRAG in the future.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
docker enhancement New feature or request Server LightRAG Server
Projects
None yet
Development

No branches or pull requests

2 participants