Skip to content

[Bug]: LightRAG does not work with gpt-4o-mini #1348

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
2 tasks done
JoedNgangmeni opened this issue Apr 11, 2025 · 21 comments
Open
2 tasks done

[Bug]: LightRAG does not work with gpt-4o-mini #1348

JoedNgangmeni opened this issue Apr 11, 2025 · 21 comments
Labels
bug Something isn't working

Comments

@JoedNgangmeni
Copy link

Do you need to file an issue?

  • I have searched the existing issues and this bug is not already filed.
  • I believe this is a legitimate bug, not just a question or feature request.

Describe the bug

When I run the code from azure open ai gpt-4o-mini it either gives me an error sayng that binding failed (LighRAG API) or never comes out of some loop while its indexing a file (azure_open_ai_demo code).

Steps to reproduce

No response

Expected Behavior

No response

LightRAG Config Used

Paste your config here

Logs and screenshots

No response

Additional Information

  • LightRAG Version:
  • Operating System:
  • Python Version:
  • Related Issues:
@JoedNgangmeni JoedNgangmeni added the bug Something isn't working label Apr 11, 2025
@danielaskdd
Copy link
Collaborator

Check your .env setting for azure OpenAI. You'd better try LightRAG Server first.

@JoedNgangmeni
Copy link
Author

I tried the server. It kept binding to ollama. Then when i removed all ollama references i got the error that it fails to bind with gpt-4o-mini

@BireleyX
Copy link

BireleyX commented Apr 11, 2025

it should work. i've been using azure 4o-mini and 4o with lightrag for months now.
check your api key and endpoint.

i've just deployed o3-mini in azure and testing it now...

make sure these .env parameters are properly set:
AZURE_OPENAI_API_VERSION=xxx
AZURE_OPENAI_DEPLOYMENT=xxxx
AZURE_OPENAI_API_KEY=xxx
AZURE_OPENAI_ENDPOINT=xxx

AZURE_EMBEDDING_DEPLOYMENT=text-embedding-3-large
AZURE_EMBEDDING_API_VERSION=2024-10-21

if the .env setttings are correct, then you might want to check your network.
if you are using an enterprise account and you're not using your company's network, then your IT may have
set limitations. you'd probably get a virtual network error like I did.
you can confirm this by logging into Azure studio on your home internet and use playground to
chat with your 4o-mini model. if you're not allowed, then that's probably the cause.

@JoedNgangmeni
Copy link
Author

What version are you using?

I have tried 3 or 4 times with 1.3.1, 1.3.

It's not the network because using the same keys and all i was able to use graphrag from microsoft.

@danielaskdd
Copy link
Collaborator

https://github.com/HKUDS/LightRAG/blob/main/lightrag/api/README.md#for-azure-openai-backend

# Azure OpenAI Configuration in .env:
LLM_BINDING=azure_openai
LLM_BINDING_HOST=your-azure-endpoint
LLM_MODEL=your-model-deployment-name
LLM_BINDING_API_KEY=your-azure-api-key
### API version is optional, defaults to latest version
AZURE_OPENAI_API_VERSION=2024-08-01-preview

### If using Azure OpenAI for embeddings
EMBEDDING_BINDING=azure_openai
EMBEDDING_MODEL=your-embedding-deployment-name

@BireleyX
Copy link

BireleyX commented Apr 11, 2025

I used it under 1.2.6 and just today I updated to 1.3.1. testing both o3-mini and gpt-4o-mini to regenerate my database.
works ok for both.

I also used lightrag_azure_openai_demo.py to kick-off my test under 1.3.1.

when this line is executed:
asyncio.run(test_funcs())

does your console display a response... something like:
Response of llm_model_func: I am an AI language model created by OpenAI, designed to assist with a wide range of questions and tasks by providing information, answering queries, and engaging in conversation. How can I help you today?

*I slightly modified the prompt and text inside the test_funcs function

@BireleyX
Copy link

BireleyX commented Apr 11, 2025

here is my .env file: (take note, i deployed gpt-4o-mini model with deployment name "gpt-4o-mini" in Azure)

### This is sample file of .env

### Server Configuration
# HOST=0.0.0.0
# PORT=9621
# WORKERS=2
# CORS_ORIGINS=http://localhost:3000,http://localhost:8080
WEBUI_TITLE='Graph RAG Engine'
WEBUI_DESCRIPTION="Simple and Fast Graph Based RAG System"

### Optional SSL Configuration
# SSL=true
# SSL_CERTFILE=/path/to/cert.pem
# SSL_KEYFILE=/path/to/key.pem

### Directory Configuration (defaults to current working directory)
WORKING_DIR="\\sgnt-dev-sgvt01\d$\GRAPES_DB"
INPUT_DIR="\\sgnt-dev-sgvt01\d$\RAG_INGEST\IN"

### Ollama Emulating Model Tag
# OLLAMA_EMULATING_MODEL_TAG=latest

### Max nodes return from grap retrieval
# MAX_GRAPH_NODES=1000

### Logging level
LOG_LEVEL=DEBUG
VERBOSE=False
LOG_MAX_BYTES=10485760
LOG_BACKUP_COUNT=5
### Logfile location (defaults to current working directory)
# LOG_DIR=/path/to/log/directory

### Settings for RAG query
HISTORY_TURNS=3
COSINE_THRESHOLD=0.2
TOP_K=60
MAX_TOKEN_TEXT_CHUNK=4000
MAX_TOKEN_RELATION_DESC=4000
MAX_TOKEN_ENTITY_DESC=4000

### Settings for document indexing
SUMMARY_LANGUAGE=English
CHUNK_SIZE=1200
CHUNK_OVERLAP_SIZE=100

### Max tokens for entity or relations summary

MAX_TOKEN_SUMMARY="5000"


### Number of parallel processing documents in one patch
MAX_PARALLEL_INSERT=2

### Num of chunks send to Embedding in single request
# EMBEDDING_BATCH_NUM=32
### Max concurrency requests for Embedding
# EMBEDDING_FUNC_MAX_ASYNC=16
# MAX_EMBED_TOKENS=8192

### LLM Configuration
### Time out in seconds for LLM, None for infinite timeout
TIMEOUT=150
### Some models like o1-mini require temperature to be set to 1
TEMPERATURE=0.5
### Max concurrency requests of LLM
MAX_ASYNC=4
### Max tokens send to LLM (less than context size of the model)
MAX_TOKENS=32768
ENABLE_LLM_CACHE=true
ENABLE_LLM_CACHE_FOR_EXTRACT=true

### Ollama example (For local services installed with docker, you can use host.docker.internal as host)
# LLM_BINDING=ollama
# LLM_MODEL=mistral-nemo:latest
# LLM_BINDING_API_KEY=your_api_key
# LLM_BINDING_HOST=http://localhost:11434

### OpenAI alike example
# LLM_BINDING=openai
# LLM_MODEL=gpt-4o
# LLM_BINDING_HOST=https://api.openai.com/v1
# LLM_BINDING_API_KEY=your_api_key
### lollms example
# LLM_BINDING=lollms
# LLM_MODEL=mistral-nemo:latest
# LLM_BINDING_HOST=http://localhost:9600
# LLM_BINDING_API_KEY=your_api_key

### Embedding Configuration (Use valid host. For local services installed with docker, you can use host.docker.internal)
EMBEDDING_MODEL=bge-m3:latest
EMBEDDING_DIM=1024
# EMBEDDING_BINDING_API_KEY=your_api_key
### ollama example
EMBEDDING_BINDING=ollama
EMBEDDING_BINDING_HOST=http://localhost:11434
### OpenAI alike example
# EMBEDDING_BINDING=openai
# LLM_BINDING_HOST=https://api.openai.com/v1
### Lollms example
# EMBEDDING_BINDING=lollms
# EMBEDDING_BINDING_HOST=http://localhost:9600

### Optional for Azure (LLM_BINDING_HOST, LLM_BINDING_API_KEY take priority)
# LLM_MODEL="gpt-4o-mini"
# LLM_BINDING="azure_openai"
# AZURE_OPENAI_API_VERSION= "2024-12-01-preview"
# AZURE_OPENAI_DEPLOYMENT="o3-mini"
AZURE_OPENAI_API_VERSION= "2024-12-01-preview"
AZURE_OPENAI_DEPLOYMENT="gpt-4o-mini"
AZURE_OPENAI_API_KEY="**************************"
AZURE_OPENAI_ENDPOINT="https://***************.openai.azure.com/"

AZURE_EMBEDDING_DEPLOYMENT="text-embedding-3-large"
AZURE_EMBEDDING_API_VERSION="2024-12-01-preview"

### Data storage selection
LIGHTRAG_KV_STORAGE=JsonKVStorage
LIGHTRAG_VECTOR_STORAGE=NanoVectorDBStorage
LIGHTRAG_GRAPH_STORAGE=NetworkXStorage
LIGHTRAG_DOC_STATUS_STORAGE=JsonDocStatusStorage

### TiDB Configuration (Deprecated)
# TIDB_HOST=localhost
# TIDB_PORT=4000
# TIDB_USER=your_username
# TIDB_PASSWORD='your_password'
# TIDB_DATABASE=your_database
### separating all data from difference Lightrag instances(deprecating)
# TIDB_WORKSPACE=default

### PostgreSQL Configuration
POSTGRES_HOST=localhost
POSTGRES_PORT=5432
POSTGRES_USER=your_username
POSTGRES_PASSWORD='your_password'
POSTGRES_DATABASE=your_database
### separating all data from difference Lightrag instances(deprecating)
# POSTGRES_WORKSPACE=default

### Independent AGM Configuration(not for AMG embedded in PostreSQL)
AGE_POSTGRES_DB=
AGE_POSTGRES_USER=
AGE_POSTGRES_PASSWORD=
AGE_POSTGRES_HOST=
# AGE_POSTGRES_PORT=8529

# AGE Graph Name(apply to PostgreSQL and independent AGM)
### AGE_GRAPH_NAME is precated
# AGE_GRAPH_NAME=lightrag

### Neo4j Configuration
NEO4J_URI=neo4j+s://xxxxxxxx.databases.neo4j.io
NEO4J_USERNAME=neo4j
NEO4J_PASSWORD='your_password'

### MongoDB Configuration
MONGO_URI=mongodb://root:root@localhost:27017/
MONGO_DATABASE=LightRAG
### separating all data from difference Lightrag instances(deprecating)
# MONGODB_GRAPH=false

### Milvus Configuration
MILVUS_URI=http://localhost:19530
MILVUS_DB_NAME=lightrag
# MILVUS_USER=root
# MILVUS_PASSWORD=your_password
# MILVUS_TOKEN=your_token

### Qdrant
QDRANT_URL=http://localhost:16333
# QDRANT_API_KEY=your-api-key

### Redis
REDIS_URI=redis://localhost:6379

### For JWT Auth
# AUTH_ACCOUNTS='admin:admin123,user1:pass456'
# TOKEN_SECRET=Your-Key-For-LightRAG-API-Server
# TOKEN_EXPIRE_HOURS=48
# GUEST_TOKEN_EXPIRE_HOURS=24
# JWT_ALGORITHM=HS256

### API-Key to access LightRAG Server API
# LIGHTRAG_API_KEY=your-secure-api-key-here
# WHITELIST_PATHS=/health,/api/*

@danielaskdd
Copy link
Collaborator

The LLM_BINDING environment variable controls the LLM API mode. For azure openai, you should set this:

LLM_BINDING=azure_openai

@BireleyX
Copy link

BireleyX commented Apr 11, 2025

The LLM_BINDING environment variable controls the LLM API mode. For azure openai, you should set this:

LLM_BINDING=azure_openai

hmm.. I tried running with that parameter disabled (commented out) but both my tests are running..
I also don't have ollama running...

@danielaskdd
Copy link
Collaborator

The Ollama problem is because the default embedding binding is ollama, you should also set it like:

EMBEDDING_BINDING=azure_openai

@BireleyX
Copy link

BireleyX commented Apr 11, 2025

seems I'm mistaken. I reused my terminal environment when running the tests.
and initially i did set LLM_BINDING and EMBEDDING_BINDING, but the demo code used

load_dotenv()

so the values never got overwritten.
after using a new terminal, i did see some problems when the these params were commented out.

@danielaskdd
Copy link
Collaborator

load_dotenv() ensures the OS environment variable takes precedence over the .env file configuration.

@BireleyX
Copy link

BireleyX commented Apr 11, 2025

load_dotenv() ensures the OS environment variable takes precedence over the .env file configuration.

actually no... it loads the .env into the the OS environment

Image

and if you sent it "override" parameter to true, any existing value in the OS environment of the same parameter will be overwritten with what is inside the .env

@danielaskdd
Copy link
Collaborator

The load_dotenv() function preserves existing OS environment variables by design, which explains why your .env file modifications aren't being applied.

@BireleyX
Copy link

The load_dotenv() function preserves existing OS environment variables by design, which explains why your .env file modifications aren't being applied.

they are preserved because the "override" parameter is False by default.

@danielaskdd
Copy link
Collaborator

It you deploy LightRAG in docker, not override is a must.

@JoedNgangmeni
Copy link
Author

Thank you both @danielaskdd, @BireleyX for the comments.
I did not change any file locations.
I added override to load_env().

The test_func() seems to work but when it starts looking at the document it just goes on forever.
I checked my azure openai metrics and it looks like calls are being made.
I dont know why this is happening.

Here are the metrics

Image

Here is the terminal output. I stopped it after a while:

Image

Here is my .env:

### This is sample file of .env

### Server Configuration
# HOST=0.0.0.0
# PORT=9621
# WORKERS=2
# CORS_ORIGINS=http://localhost:3000,http://localhost:8080
WEBUI_TITLE='Graph RAG Engine'
WEBUI_DESCRIPTION="Simple and Fast Graph Based RAG System"

### Optional SSL Configuration
# SSL=true
# SSL_CERTFILE=/path/to/cert.pem
# SSL_KEYFILE=/path/to/key.pem

### Directory Configuration (defaults to current working directory)
# WORKING_DIR=<absolute_path_for_working_dir>
# INPUT_DIR=<absolute_path_for_doc_input_dir>

### Ollama Emulating Model Tag
# OLLAMA_EMULATING_MODEL_TAG=latest

### Max nodes return from grap retrieval
# MAX_GRAPH_NODES=1000

### Logging level
LOG_LEVEL=INFO
VERBOSE=True
LOG_MAX_BYTES=10485760
LOG_BACKUP_COUNT=5
### Logfile location (defaults to current working directory)
# LOG_DIR=/path/to/log/directory

### Settings for RAG query
HISTORY_TURNS=3
COSINE_THRESHOLD=0.2
TOP_K=60
MAX_TOKEN_TEXT_CHUNK=4000
MAX_TOKEN_RELATION_DESC=4000
MAX_TOKEN_ENTITY_DESC=4000

### Settings for document indexing
SUMMARY_LANGUAGE=English
CHUNK_SIZE=1200
CHUNK_OVERLAP_SIZE=100

### Number of parallel processing documents in one patch
MAX_PARALLEL_INSERT=2

### Max tokens for entity/relations description after merge
MAX_TOKEN_SUMMARY=5000
### Number of entities/edges to trigger LLM re-summary on merge ( at least 3 is recommented)
# FORCE_LLM_SUMMARY_ON_MERGE=6

### Num of chunks send to Embedding in single request
# EMBEDDING_BATCH_NUM=32
### Max concurrency requests for Embedding
# EMBEDDING_FUNC_MAX_ASYNC=16
# MAX_EMBED_TOKENS=8192

### LLM Configuration
### Time out in seconds for LLM, None for infinite timeout
TIMEOUT=150
### Some models like o1-mini require temperature to be set to 1
TEMPERATURE=0.5
### Max concurrency requests of LLM
MAX_ASYNC=4
### Max tokens send to LLM (less than context size of the model)
MAX_TOKENS=32768
ENABLE_LLM_CACHE=true
ENABLE_LLM_CACHE_FOR_EXTRACT=true

### Ollama example (For local services installed with docker, you can use host.docker.internal as host)
# LLM_BINDING=ollama
# LLM_MODEL=mistral-nemo:latest
# LLM_BINDING_API_KEY=your_api_key
# LLM_BINDING_HOST=http://localhost:11434

### OpenAI alike example
# LLM_BINDING=openai
# LLM_MODEL=gpt-4o
# LLM_BINDING_HOST=https://api.openai.com/v1
# LLM_BINDING_API_KEY=your_api_key
### lollms example
# LLM_BINDING=lollms
# LLM_MODEL=mistral-nemo:latest
# LLM_BINDING_HOST=http://localhost:9600
# LLM_BINDING_API_KEY=your_api_key

### Embedding Configuration (Use valid host. For local services installed with docker, you can use host.docker.internal)
EMBEDDING_MODEL=bge-m3:latest
EMBEDDING_DIM=1024
# EMBEDDING_BINDING_API_KEY=your_api_key
### ollama example
EMBEDDING_BINDING=ollama
EMBEDDING_BINDING_HOST=http://localhost:11434
### OpenAI alike example
# EMBEDDING_BINDING=openai
# LLM_BINDING_HOST=https://api.openai.com/v1
### Lollms example
# EMBEDDING_BINDING=lollms
# EMBEDDING_BINDING_HOST=http://localhost:9600

### Optional for Azure (LLM_BINDING_HOST, LLM_BINDING_API_KEY take priority)
LLM_BINDING=azure_openai
AZURE_OPENAI_API_VERSION="2024-12-01-preview"
AZURE_OPENAI_DEPLOYMENT="o3-mini"
AZURE_OPENAI_API_KEY="****************************"
AZURE_OPENAI_ENDPOINT="****************************.azure.com/"

AZURE_EMBEDDING_DEPLOYMENT="text-embedding-3-large"
AZURE_EMBEDDING_API_VERSION="2024-12-01-preview"

### Data storage selection
LIGHTRAG_KV_STORAGE=JsonKVStorage
LIGHTRAG_VECTOR_STORAGE=NanoVectorDBStorage
LIGHTRAG_GRAPH_STORAGE=NetworkXStorage
LIGHTRAG_DOC_STATUS_STORAGE=JsonDocStatusStorage

### TiDB Configuration (Deprecated)
# TIDB_HOST=localhost
# TIDB_PORT=4000
# TIDB_USER=your_username
# TIDB_PASSWORD='your_password'
# TIDB_DATABASE=your_database
### separating all data from difference Lightrag instances(deprecating)
# TIDB_WORKSPACE=default

### PostgreSQL Configuration
POSTGRES_HOST=localhost
POSTGRES_PORT=5432
POSTGRES_USER=your_username
POSTGRES_PASSWORD='your_password'
POSTGRES_DATABASE=your_database
### separating all data from difference Lightrag instances(deprecating)
# POSTGRES_WORKSPACE=default

### Independent AGM Configuration(not for AMG embedded in PostreSQL)
AGE_POSTGRES_DB=
AGE_POSTGRES_USER=
AGE_POSTGRES_PASSWORD=
AGE_POSTGRES_HOST=
# AGE_POSTGRES_PORT=8529

# AGE Graph Name(apply to PostgreSQL and independent AGM)
### AGE_GRAPH_NAME is precated
# AGE_GRAPH_NAME=lightrag

### Neo4j Configuration
NEO4J_URI=neo4j+s://xxxxxxxx.databases.neo4j.io
NEO4J_USERNAME=neo4j
NEO4J_PASSWORD='your_password'

### MongoDB Configuration
MONGO_URI=mongodb://root:root@localhost:27017/
MONGO_DATABASE=LightRAG
### separating all data from difference Lightrag instances(deprecating)
# MONGODB_GRAPH=false

### Milvus Configuration
MILVUS_URI=http://localhost:19530
MILVUS_DB_NAME=lightrag
# MILVUS_USER=root
# MILVUS_PASSWORD=your_password
# MILVUS_TOKEN=your_token

### Qdrant
QDRANT_URL=http://localhost:16333
# QDRANT_API_KEY=your-api-key

### Redis
REDIS_URI=redis://localhost:6379

### For JWT Auth
# AUTH_ACCOUNTS='admin:admin123,user1:pass456'
# TOKEN_SECRET=Your-Key-For-LightRAG-API-Server
# TOKEN_EXPIRE_HOURS=48
# GUEST_TOKEN_EXPIRE_HOURS=24
# JWT_ALGORITHM=HS256

### API-Key to access LightRAG Server API
# LIGHTRAG_API_KEY=your-secure-api-key-here
# WHITELIST_PATHS=/health,/api/*

@JoedNgangmeni
Copy link
Author

I get the same problem when i try with gpt-4o-mini.

Here is my azure demo file:

import os
import asyncio
from lightrag import LightRAG, QueryParam
from lightrag.utils import EmbeddingFunc
import numpy as np
from dotenv import load_dotenv
import logging
from openai import AzureOpenAI
from lightrag.kg.shared_storage import initialize_pipeline_status
import os
from tqdm import tqdm

logging.basicConfig(level=logging.INFO)

load_dotenv(override=True)

AZURE_OPENAI_API_VERSION = os.getenv("AZURE_OPENAI_API_VERSION")
AZURE_OPENAI_DEPLOYMENT = os.getenv("AZURE_OPENAI_DEPLOYMENT")
AZURE_OPENAI_API_KEY = os.getenv("AZURE_OPENAI_API_KEY")
AZURE_OPENAI_ENDPOINT = os.getenv("AZURE_OPENAI_ENDPOINT")

AZURE_EMBEDDING_DEPLOYMENT = os.getenv("AZURE_EMBEDDING_DEPLOYMENT")
AZURE_EMBEDDING_API_VERSION = os.getenv("AZURE_EMBEDDING_API_VERSION")

WORKING_DIR = "./dev_claims_3o_model"

if os.path.exists(WORKING_DIR):
    import shutil

    shutil.rmtree(WORKING_DIR)

os.mkdir(WORKING_DIR)


async def llm_model_func(
    prompt, system_prompt=None, history_messages=[], keyword_extraction=False, **kwargs
) -> str:
    client = AzureOpenAI(
        api_key=AZURE_OPENAI_API_KEY,
        api_version=AZURE_OPENAI_API_VERSION,
        azure_endpoint=AZURE_OPENAI_ENDPOINT,
    )

    messages = []
    if system_prompt:
        messages.append({"role": "system", "content": system_prompt})
    if history_messages:
        messages.extend(history_messages)
    messages.append({"role": "user", "content": prompt})

    chat_completion = client.chat.completions.create(
        model=AZURE_OPENAI_DEPLOYMENT,  # model = "deployment_name".
        messages=messages,
        # temperature=kwargs.get("temperature", 0), # This is not supported for o3-mini
        top_p=kwargs.get("top_p", 1),
        n=kwargs.get("n", 1),
    )
    return chat_completion.choices[0].message.content


async def embedding_func(texts: list[str]) -> np.ndarray:
    client = AzureOpenAI(
        api_key=AZURE_OPENAI_API_KEY,
        api_version=AZURE_EMBEDDING_API_VERSION,
        azure_endpoint=AZURE_OPENAI_ENDPOINT,
    )
    embedding = client.embeddings.create(model=AZURE_EMBEDDING_DEPLOYMENT, input=texts)

    embeddings = [item.embedding for item in embedding.data]
    return np.array(embeddings)


async def test_funcs():
    result = await llm_model_func("How are you?")
    print("Resposta do llm_model_func: ", result)

    result = await embedding_func(["How are you?"])
    print("Resultado do embedding_func: ", result.shape)
    print("Dimensão da embedding: ", result.shape[1])


asyncio.run(test_funcs())

embedding_dimension = 3072


async def initialize_rag():
    rag = LightRAG(
        working_dir=WORKING_DIR,
        llm_model_func=llm_model_func,
        embedding_func=EmbeddingFunc(
            embedding_dim=embedding_dimension,
            max_token_size=8192,
            func=embedding_func,
        ),
    )

    await rag.initialize_storages()
    await initialize_pipeline_status()

    return rag


def main():
    rag = asyncio.run(initialize_rag())
    dataPath = "dev_claims_data/input"

    for file in tqdm(os.listdir(dataPath)):
        fPath = os.path.join(dataPath, file)
        
        book = open(fPath, encoding="utf-8")

        rag.insert(book.read())

    query_text = "What are the main themes?"

    print("Result (Naive):")
    print(rag.query(query_text, param=QueryParam(mode="naive")))

    print("\nResult (Local):")
    print(rag.query(query_text, param=QueryParam(mode="local")))

    print("\nResult (Global):")
    print(rag.query(query_text, param=QueryParam(mode="global")))

    print("\nResult (Hybrid):")
    print(rag.query(query_text, param=QueryParam(mode="hybrid")))


if __name__ == "__main__":
    main()

@danielaskdd
Copy link
Collaborator

It appears the LLM request failed for some reason. Please verify your implementation of llm_model_func by testing it separately.

@BireleyX
Copy link

BireleyX commented Apr 24, 2025

@JoedNgangmeni you got it working already?

here's my llm_func that works on both o3-mini and gpt-4o-mini:

async def llm_model_func(
    prompt, system_prompt=None, history_messages = [], keyword_extraction=False, **kwargs
) -> str:
    client = AzureOpenAI(
        api_key=AZURE_OPENAI_API_KEY,
        api_version=AZURE_OPENAI_API_VERSION,
        azure_endpoint=AZURE_OPENAI_ENDPOINT,
    )

    messages = []
    if system_prompt:
        messages.append({"role": "system", "content": system_prompt})
    if history_messages:
        messages.extend(history_messages)
    messages.append({"role": "user", "content": prompt})

    #reasoning models
    if AZURE_OPENAI_DEPLOYMENT.startswith("o"):
        chat_completion = client.chat.completions.create(
            model=AZURE_OPENAI_DEPLOYMENT,  # model = "deployment_name".
            messages=messages,
            reasoning_effort=kwargs.get("reasoning_effort", "low"),
            max_completion_tokens=kwargs.get("max_completion_tokens", MAX_TOKENS),
            n=kwargs.get("n", 1),
        )      
    else:  #non-reasonign models
        chat_completion = client.chat.completions.create(
            model=AZURE_OPENAI_DEPLOYMENT,  # model = "deployment_name".
            messages=messages,
            temperature=kwargs.get("temperature", 0),
            top_p=kwargs.get("top_p", 1),
            n=kwargs.get("n", 1),
        )
    return chat_completion.choices[0].message.content


async def embedding_function(texts: list[str]) -> np.ndarray:
    client = AzureOpenAI(
        api_key=AZURE_OPENAI_API_KEY,
        api_version=AZURE_EMBEDDING_API_VERSION,
        azure_endpoint=AZURE_OPENAI_ENDPOINT,
    )
    embedding = client.embeddings.create(model=AZURE_EMBEDDING_DEPLOYMENT, input=texts)

    embeddings = [item.embedding for item in embedding.data]
    return np.array(embeddings)

async def initialize_rag():
    embedding = None

    if USE_LOCAL_EMBEDDING:
        embedding=EmbeddingFunc(
            embedding_dim=EMBEDDING_DIM,
            max_token_size=MAX_EMBED_TOKENS,
            func=lambda texts: ollama_embed(
                texts, embed_model=EMBEDDING_MODEL, host=EMBEDDING_BINDING_HOST
            )
        )
    else:
        embedding=EmbeddingFunc(
            embedding_dim=EMBEDDING_DIM,
            max_token_size=MAX_EMBED_TOKENS,
            func=embedding_function
        )

    if USE_LOCAL_LLM:
        rag = LightRAG(
            working_dir=WORKING_DIR,
            llm_model_func=ollama_model_complete,
            llm_model_name=LLM_MODEL,
            llm_model_max_async=MAX_ASYNC,
            llm_model_max_token_size=MAX_TOKENS,
            llm_model_kwargs={"host": LLM_BINDING_HOST, "options": {"num_ctx": 32768}},
            embedding_func=embedding
            )
        
        await rag.initialize_storages()
        await initialize_pipeline_status()

        logging.info(f"LLM Initialized for Ollama: {LLM_MODEL}, {LLM_BINDING_HOST}")
        logging.info(f"Embedding Initialized for Ollama: {EMBEDDING_MODEL}, {EMBEDDING_BINDING_HOST}")  
    else:
        rag = LightRAG(
            working_dir=WORKING_DIR,
            llm_model_func=llm_model_func,
            embedding_func=embedding
        )

        await rag.initialize_storages()
        await initialize_pipeline_status()

        logging.info(f"LLM Initialized for Azure OpenAI: {AZURE_OPENAI_DEPLOYMENT}, {AZURE_OPENAI_ENDPOINT}")
        logging.info(f"Embedding Initialized for Azure OpenAI: {AZURE_EMBEDDING_DEPLOYMENT}, {AZURE_OPENAI_ENDPOINT}")

    logging.info(f"LightRAG Daemon Initialized with chunk size: {CHUNK_SIZE}, overlap size: {CHUNK_OVERLAP_SIZE}")
    return rag

I can't find any obvious problem with your code...
and from your terminal output, it seems to indicate you are hitting your rate_limits in Azure...
but it should be able to 'recover'/process a chunk after pausing and retrying...

how about you try maxing out your rate limit settings like I did:

Image

@JoedNgangmeni
Copy link
Author

Yes. It finally works.

I have no idea why though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants