You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have searched the existing issues and this bug is not already filed.
I believe this is a legitimate bug, not just a question or feature request.
Describe the bug
Problem Description
When running LightRAG, document chunking completes successfully, but the entity extraction phase fails with a 401 authentication error. Interestingly, later query operations succeed in calling the API correctly.
Detailed Observation
Document Chunking Success: The document is successfully chunked into 831 segments as shown in the log:
INFO: Process 3982309 KV load text_chunks with 831 records
Partial Entity Extraction: The system begins entity extraction and successfully processes the first few chunks:
INFO: Chk 1/831: extracted 1 Ent + 0 Rel
INFO: Chk 2/831: extracted 1 Ent + 0 Rel
INFO: Chk 3/831: extracted 2 Ent + 1 Rel
Authentication Error: After processing only 3 chunks, entity extraction fails with an API authentication error:
ERROR: Failed to process document doc-addb4618e1697da0445ec72a648e1f92: Error code: 401 - {'error': {'message': 'Incorrect API key provided: sk-proj-************************************************************************************************************************lLwA...
However, this key is not what I had set in my .env.
Query Operations Succeed: Despite the entity extraction failure, the subsequent query operations are complete. I am sure the API calling with my .env configuration is correct:
🔍 Query: 'What are the main characters in the story?' (Mode: local)
INFO: Process 3982309 buidling query context...
INFO: Query nodes: Protagonist, Antagonist, Supporting characters, Character development, Plot, top_k: 60, cosine: 0.2
Execution time: 1.44 seconds
📊 Response length: 70 characters
📄 Response preview: Sorry, I'm not able to provide an answer to that question.[no-context]...
🔍 Query: 'What happened during the Christmas celebration?' (Mode: global)
INFO: Process 3982309 buidling query context...
INFO: Query edges: Christmas celebration, Cultural practices, Festive events, top_k: 60, cosine: 0.2
Execution time: 3.82 seconds
📊 Response length: 70 characters
📄 Response preview: Sorry, I'm not able to provide an answer to that question.[no-context]...
🔍 Query: 'How did Scrooge's character change throughout the story?' (Mode: mix)
INFO: Process 3982309 buidling query context...
INFO: Query nodes: A Christmas Carol, Redemption, Greed, Kindness, Interactions with Marley, Tiny Tim, top_k: 60, cosine: 0.2
INFO: Query edges: Character development, Literary analysis, Scrooge's transformation, top_k: 60, cosine: 0.2
Execution time: 4.03 seconds
📊 Response length: 70 characters
📄 Response preview: Sorry, I'm not able to provide an answer to that question.[no-context]...
importosimportasyncioimporttimefromlightragimportLightRAG, QueryParamfromlightrag.llm.openaiimportopenai_embed, gpt_4o_mini_completefromlightrag.kg.shared_storageimportinitialize_pipeline_statusfromlightrag.utilsimportsetup_loggerfromdotenvimportload_dotenv# Environment setupload_dotenv()
# Configure loggingsetup_logger("lightrag", level="INFO")
asyncdefinitialize_rag(config=None):
"""Initialize a LightRAG instance with configurable parameters"""# Default configurationdefault_config= {
"working_dir": "./lightrag_cache",
"embedding_func": openai_embed,
"llm_model_func": gpt_4o_mini_complete,
"kv_storage": "JsonKVStorage",
"vector_storage": "NanoVectorDBStorage",
"graph_storage": "NetworkXStorage",
"chunk_token_size": 1200,
"chunk_overlap_token_size": 100,
"embedding_batch_num": 32,
"embedding_func_max_async": 16,
"llm_model_max_async": 4,
"max_parallel_insert": 2,
"entity_extract_max_gleaning": 1,
}
# Override defaults with provided configifconfig:
default_config.update(config)
# Create LightRAG instancerag=LightRAG(
working_dir=default_config["working_dir"],
embedding_func=default_config["embedding_func"],
llm_model_func=default_config["llm_model_func"],
kv_storage=default_config["kv_storage"],
vector_storage=default_config["vector_storage"],
graph_storage=default_config["graph_storage"],
chunk_token_size=default_config["chunk_token_size"],
chunk_overlap_token_size=default_config["chunk_overlap_token_size"],
embedding_batch_num=default_config["embedding_batch_num"],
embedding_func_max_async=default_config["embedding_func_max_async"],
llm_model_max_async=default_config["llm_model_max_async"],
max_parallel_insert=default_config["max_parallel_insert"],
entity_extract_max_gleaning=default_config["entity_extract_max_gleaning"],
)
# Initialize storages and pipeline statusawaitrag.initialize_storages()
awaitinitialize_pipeline_status()
returnragasyncdefmeasure_performance(func, *args, **kwargs):
"""Measure execution time of a given function"""start_time=time.time()
result=awaitfunc(*args, **kwargs)
end_time=time.time()
print(f"Execution time: {end_time-start_time:.2f} seconds")
returnresultasyncdefload_document(rag, file_path):
"""Load a document from a file"""withopen(file_path, 'r') asf:
content=f.read()
returncontentasyncdefinsert_document(rag, document, split_by_character=None, split_by_character_only=False):
"""Insert a document into LightRAG"""awaitrag.ainsert(document, split_by_character, split_by_character_only)
print(f"Document inserted. Length: {len(document)} characters")
asyncdefquery_document(rag, query, mode="global", top_k=60):
"""Query the document with the specified mode"""param=QueryParam(mode=mode, top_k=top_k)
response=awaitrag.aquery(query, param=param)
returnresponsedefrun_async(coroutine):
"""Run an async function"""returnasyncio.run(coroutine)
main.py
importosimportasynciofromlightrag_examples.basic_setupimportinitialize_rag, measure_performance, load_document, insert_document, query_documentasyncdefrun_embedding_optimized():
"""Run LightRAG optimized for embedding bottlenecks"""print("⚙️ Running LightRAG with embedding optimization")
# Configure for embedding bottlenecksconfig= {
"working_dir": "./lightrag_cache_embedding_opt",
"embedding_batch_num": 64, # Increased batch size for embeddings"embedding_func_max_async": 32, # More concurrent embedding operations"chunk_token_size": 2000, # Larger chunks to reduce total embeddings"chunk_overlap_token_size": 200, # Increased overlap for better context preservation# Enable embedding cache to avoid redundant computations"embedding_cache_config": {
"enabled": True,
"similarity_threshold": 0.92,
"use_llm_check": False,
}
}
# Initialize RAG with optimized configrag=awaitinitialize_rag(config)
# Load documentdocument=awaitload_document(rag, "lightrag_examples/sample_book.txt")
# Insert document with performance measurementprint("📥 Inserting document with embedding optimization...")
awaitmeasure_performance(insert_document, rag, document, "\n\n", False)
# Query with various modesprint("\n📝 Testing queries with different modes:")
queries= [
("What are the main characters in the story?", "local"),
("What happened during the Christmas celebration?", "global"),
("How did Scrooge's character change throughout the story?", "mix"),
]
forquery, modeinqueries:
print(f"\n🔍 Query: '{query}' (Mode: {mode})")
response=awaitmeasure_performance(query_document, rag, query, mode)
print(f"📊 Response length: {len(str(response))} characters")
print(f"📄 Response preview: {str(response)[:200]}...")
print("\n✅ Embedding-optimized example completed")
Expected Behavior
No response
LightRAG Config Used
Paste your config here
Logs and screenshots
INFO: Process 3982309 Shared-Data created for Single Process
INFO: Loaded graph from ./lightrag_cache_embedding_opt/graph_chunk_entity_relation.graphml with 0 nodes, 0 edges
INFO:nano-vectordb:Load (0, 1536) data
INFO:nano-vectordb:Init {'embedding_dim': 1536, 'metric': 'cosine', 'storage_file': './lightrag_cache_embedding_opt/vdb_entities.json'} 0 data
INFO:nano-vectordb:Load (0, 1536) data
INFO:nano-vectordb:Init {'embedding_dim': 1536, 'metric': 'cosine', 'storage_file': './lightrag_cache_embedding_opt/vdb_relationships.json'} 0 data
INFO:nano-vectordb:Load (0, 1536) data
INFO:nano-vectordb:Init {'embedding_dim': 1536, 'metric': 'cosine', 'storage_file': './lightrag_cache_embedding_opt/vdb_chunks.json'} 0 data
INFO: Process 3982309 initialized updated flags for namespace: [full_docs]
INFO: Process 3982309 ready to initialize storage namespace: [full_docs]
INFO: Process 3982309 KV load full_docs with 1 records
INFO: Process 3982309 initialized updated flags for namespace: [text_chunks]
INFO: Process 3982309 ready to initialize storage namespace: [text_chunks]
INFO: Process 3982309 KV load text_chunks with 831 records
INFO: Process 3982309 initialized updated flags for namespace: [entities]
INFO: Process 3982309 initialized updated flags for namespace: [relationships]
INFO: Process 3982309 initialized updated flags for namespace: [chunks]
INFO: Process 3982309 initialized updated flags for namespace: [chunk_entity_relation]
INFO: Process 3982309 initialized updated flags for namespace: [llm_response_cache]
INFO: Process 3982309 ready to initialize storage namespace: [llm_response_cache]
INFO: Process 3982309 KV load llm_response_cache with 12 records
INFO: Process 3982309 initialized updated flags for namespace: [doc_status]
INFO: Process 3982309 ready to initialize storage namespace: [doc_status]
INFO: Process 3982309 doc status load doc_status with 1 records
INFO: Process 3982309 storage namespace already initialized: [full_docs]
INFO: Process 3982309 storage namespace already initialized: [text_chunks]
INFO: Process 3982309 storage namespace already initialized: [llm_response_cache]
INFO: Process 3982309 storage namespace already initialized: [doc_status]
INFO: Process 3982309 Pipeline namespace initialized
📥 Inserting document with embedding optimization...
INFO: No new unique documents were found.
INFO: Storage Initialization completed!
INFO: Processing 1 document(s) in 1 batches
INFO: Start processing batch 1 of 1.
INFO: Processing file: unknown_source
INFO: Processing d-id: doc-addb4618e1697da0445ec72a648e1f92
INFO: Process 3982309 doc status writting 1 records to doc_status
INFO: Chk 1/831: extracted 1 Ent + 0 Rel
INFO: Chk 2/831: extracted 1 Ent + 0 Rel
INFO: Chk 3/831: extracted 2 Ent + 1 Rel
ERROR: Failed to process document doc-addb4618e1697da0445ec72a648e1f92: Error code: 401 - {'error': {'message': 'Incorrect API key provided: sk-proj-************************************************************************************************************************lLwA. You can find your API key at https://platform.openai.com/account/api-keys. (request id: 2025041417484467030518430jtGrk6) (request id: 202504141748446011421233WddDLyx)', 'type': 'invalid_request_error', 'param': '', 'code': 'invalid_api_key'}}
INFO: Process 3982309 doc status writting 1 records to doc_status
INFO: Process 3982309 KV writting 1 records to full_docs
INFO: Process 3982309 KV writting 831 records to text_chunks
INFO: Writing graph with 0 nodes, 0 edges
INFO: In memory DB persist to disk
INFO: Completed batch 1 of 1.
INFO: Document processing pipeline completed
Document inserted. Length: 185067 characters
Execution time: 4.03 seconds
📝 Testing queries with different modes:
🔍 Query: 'What are the main characters in the story?' (Mode: local)
INFO: Process 3982309 buidling query context...
INFO: Query nodes: Protagonist, Antagonist, Supporting characters, Character development, Plot, top_k: 60, cosine: 0.2
Execution time: 1.44 seconds
📊 Response length: 70 characters
📄 Response preview: Sorry, I'm not able to provide an answer to that question.[no-context]...
🔍 Query: 'What happened during the Christmas celebration?' (Mode: global)
INFO: Process 3982309 buidling query context...
INFO: Query edges: Christmas celebration, Cultural practices, Festive events, top_k: 60, cosine: 0.2
Execution time: 3.82 seconds
📊 Response length: 70 characters
📄 Response preview: Sorry, I'm not able to provide an answer to that question.[no-context]...
🔍 Query: 'How did Scrooge's character change throughout the story?' (Mode: mix)
INFO: Process 3982309 buidling query context...
INFO: Query nodes: A Christmas Carol, Redemption, Greed, Kindness, Interactions with Marley, Tiny Tim, top_k: 60, cosine: 0.2
INFO: Query edges: Character development, Literary analysis, Scrooge's transformation, top_k: 60, cosine: 0.2
Execution time: 4.03 seconds
📊 Response length: 70 characters
📄 Response preview: Sorry, I'm not able to provide an answer to that question.[no-context]...
✅ Embedding-optimized example completed
INFO: Creating a new event loop in main thread.
Additional Information
LightRAG Version:
Operating System:
Python Version:
Related Issues:
The text was updated successfully, but these errors were encountered:
The OS environment variables take precedence over the .env file. Please launch a new terminal session for the updated .env file changes to take effect.
Do you need to file an issue?
Describe the bug
Problem Description
When running LightRAG, document chunking completes successfully, but the entity extraction phase fails with a 401 authentication error. Interestingly, later query operations succeed in calling the API correctly.
Detailed Observation
Document Chunking Success: The document is successfully chunked into 831 segments as shown in the log:
Partial Entity Extraction: The system begins entity extraction and successfully processes the first few chunks:
Authentication Error: After processing only 3 chunks, entity extraction fails with an API authentication error:
However, this key is not what I had set in my
.env
..env
configuration is correct:Steps to reproduce
Download data
curl https://raw.githubusercontent.com/gusye1234/nano-graphrag/main/tests/mock_data.txt > lightrag_examples/sample_book.txt
Script
basic_setup
main.py
Expected Behavior
No response
LightRAG Config Used
Paste your config here
Logs and screenshots
INFO: Process 3982309 Shared-Data created for Single Process
INFO: Loaded graph from ./lightrag_cache_embedding_opt/graph_chunk_entity_relation.graphml with 0 nodes, 0 edges
INFO:nano-vectordb:Load (0, 1536) data
INFO:nano-vectordb:Init {'embedding_dim': 1536, 'metric': 'cosine', 'storage_file': './lightrag_cache_embedding_opt/vdb_entities.json'} 0 data
INFO:nano-vectordb:Load (0, 1536) data
INFO:nano-vectordb:Init {'embedding_dim': 1536, 'metric': 'cosine', 'storage_file': './lightrag_cache_embedding_opt/vdb_relationships.json'} 0 data
INFO:nano-vectordb:Load (0, 1536) data
INFO:nano-vectordb:Init {'embedding_dim': 1536, 'metric': 'cosine', 'storage_file': './lightrag_cache_embedding_opt/vdb_chunks.json'} 0 data
INFO: Process 3982309 initialized updated flags for namespace: [full_docs]
INFO: Process 3982309 ready to initialize storage namespace: [full_docs]
INFO: Process 3982309 KV load full_docs with 1 records
INFO: Process 3982309 initialized updated flags for namespace: [text_chunks]
INFO: Process 3982309 ready to initialize storage namespace: [text_chunks]
INFO: Process 3982309 KV load text_chunks with 831 records
INFO: Process 3982309 initialized updated flags for namespace: [entities]
INFO: Process 3982309 initialized updated flags for namespace: [relationships]
INFO: Process 3982309 initialized updated flags for namespace: [chunks]
INFO: Process 3982309 initialized updated flags for namespace: [chunk_entity_relation]
INFO: Process 3982309 initialized updated flags for namespace: [llm_response_cache]
INFO: Process 3982309 ready to initialize storage namespace: [llm_response_cache]
INFO: Process 3982309 KV load llm_response_cache with 12 records
INFO: Process 3982309 initialized updated flags for namespace: [doc_status]
INFO: Process 3982309 ready to initialize storage namespace: [doc_status]
INFO: Process 3982309 doc status load doc_status with 1 records
INFO: Process 3982309 storage namespace already initialized: [full_docs]
INFO: Process 3982309 storage namespace already initialized: [text_chunks]
INFO: Process 3982309 storage namespace already initialized: [llm_response_cache]
INFO: Process 3982309 storage namespace already initialized: [doc_status]
INFO: Process 3982309 Pipeline namespace initialized
📥 Inserting document with embedding optimization...
INFO: No new unique documents were found.
INFO: Storage Initialization completed!
INFO: Processing 1 document(s) in 1 batches
INFO: Start processing batch 1 of 1.
INFO: Processing file: unknown_source
INFO: Processing d-id: doc-addb4618e1697da0445ec72a648e1f92
INFO: Process 3982309 doc status writting 1 records to doc_status
INFO: Chk 1/831: extracted 1 Ent + 0 Rel
INFO: Chk 2/831: extracted 1 Ent + 0 Rel
INFO: Chk 3/831: extracted 2 Ent + 1 Rel
ERROR: Failed to process document doc-addb4618e1697da0445ec72a648e1f92: Error code: 401 - {'error': {'message': 'Incorrect API key provided: sk-proj-************************************************************************************************************************lLwA. You can find your API key at https://platform.openai.com/account/api-keys. (request id: 2025041417484467030518430jtGrk6) (request id: 202504141748446011421233WddDLyx)', 'type': 'invalid_request_error', 'param': '', 'code': 'invalid_api_key'}}
INFO: Process 3982309 doc status writting 1 records to doc_status
INFO: Process 3982309 KV writting 1 records to full_docs
INFO: Process 3982309 KV writting 831 records to text_chunks
INFO: Writing graph with 0 nodes, 0 edges
INFO: In memory DB persist to disk
INFO: Completed batch 1 of 1.
INFO: Document processing pipeline completed
Document inserted. Length: 185067 characters
Execution time: 4.03 seconds
📝 Testing queries with different modes:
🔍 Query: 'What are the main characters in the story?' (Mode: local)
INFO: Process 3982309 buidling query context...
INFO: Query nodes: Protagonist, Antagonist, Supporting characters, Character development, Plot, top_k: 60, cosine: 0.2
Execution time: 1.44 seconds
📊 Response length: 70 characters
📄 Response preview: Sorry, I'm not able to provide an answer to that question.[no-context]...
🔍 Query: 'What happened during the Christmas celebration?' (Mode: global)
INFO: Process 3982309 buidling query context...
INFO: Query edges: Christmas celebration, Cultural practices, Festive events, top_k: 60, cosine: 0.2
Execution time: 3.82 seconds
📊 Response length: 70 characters
📄 Response preview: Sorry, I'm not able to provide an answer to that question.[no-context]...
🔍 Query: 'How did Scrooge's character change throughout the story?' (Mode: mix)
INFO: Process 3982309 buidling query context...
INFO: Query nodes: A Christmas Carol, Redemption, Greed, Kindness, Interactions with Marley, Tiny Tim, top_k: 60, cosine: 0.2
INFO: Query edges: Character development, Literary analysis, Scrooge's transformation, top_k: 60, cosine: 0.2
Execution time: 4.03 seconds
📊 Response length: 70 characters
📄 Response preview: Sorry, I'm not able to provide an answer to that question.[no-context]...
✅ Embedding-optimized example completed
INFO: Creating a new event loop in main thread.
Additional Information
The text was updated successfully, but these errors were encountered: