[Bug]: (rag.query_with_separate_keyword_extraction)When I use word embeddings provided by SiliconFlow, there's a retrieval failure phenomenon, but this doesn't occur with the OpenAI interface. #1378

qcjySONG · 2025-04-15T13:22:37Z

Do you need to file an issue?

I have searched the existing issues and this bug is not already filed.
I believe this is a legitimate bug, not just a question or feature request.

Describe the bug

When I use the rag.query() function, it returns normally. However, when I use rag.query_with_separate_keyword_extraction(), a retrieval failure occurs.

Steps to reproduce

from lightrag.llm.openai import openai_complete_if_cache,openai_embed
from lightrag.utils import EmbeddingFunc
from lightrag.llm.siliconcloud import siliconcloud_embedding
from lightrag import LightRAG, QueryParam
from lightrag.llm.hf import hf_embed
from lightrag.utils import EmbeddingFunc

#DeepSeek的API
LLM_MODEL = os.environ.get("LLM_MODEL", "deepseek-reasoner")
BASE_URL = os.environ.get("BASE_URL", "https://api.deepseek.com/v1")
API_KEY = os.environ.get("API_KEY", "sk-123456789")



async def embedding_func(texts: list[str]) -> np.ndarray:
    print(texts)
    return await siliconcloud_embedding(
        texts=texts,
        model='BAAI/bge-m3',
        base_url='https://api.siliconflow.cn/v1/embeddings',
        api_key='sk-123456789',
    )

async def get_embedding_dim():
    test_text = ["This is a test sentence."]
    embedding = await embedding_func(test_text)
    print(embedding)
    embedding_dim = embedding.shape[1]
    print(f"{embedding_dim=}")
    return embedding_dim    

# LLM model function
async def llm_model_func(
    prompt, system_prompt=None, history_messages=[], keyword_extraction=False, **kwargs
) -> str:
    return await openai_complete_if_cache(
        model=LLM_MODEL,
        prompt=prompt,
        system_prompt=system_prompt,
        history_messages=history_messages,
        base_url=BASE_URL,
        api_key=API_KEY,
        **kwargs,
    )
 
working_dir='/home/amax/qcjySONG/newsql/Spider2_0406/spider2-snow/resource/databases/GITHUB_REPOS/GITHUB_REPOS_emb'

pro=“my Pro”

rag = LightRAG(
    working_dir=working_dir,
    llm_model_func=llm_model_func,
    llm_model_max_token_size=65536,
    embedding_func=EmbeddingFunc(
        embedding_dim=1024,
        max_token_size=8192,
        func=embedding_func
    ),
    enable_llm_cache=False
)

# temp=rag.query(
#         pro,
#         param=QueryParam(mode='global',only_need_context=True)
#     )

# print(temp)

#"local","global","global","hybrid"

temp=rag.query_with_separate_keyword_extraction(
        query= "How ……,
        prompt=pro,
        param=QueryParam(mode="local",only_need_context=True)
    )
print(temp)

return:
INFO: Process 7831 Shared-Data created for Single Process INFO:nano-vectordb:Load (37, 1024) data INFO:nano-vectordb:Init {'embedding_dim': 1024, 'metric': 'cosine', 'storage_file': '/home/amax/qcjySONG/newsql/Spider2_0406/spider2-snow/resource/databases/GITHUB_REPOS/GITHUB_REPOS_emb/vdb_entities.json'} 37 data INFO:nano-vectordb:Load (42, 1024) data INFO:nano-vectordb:Init {'embedding_dim': 1024, 'metric': 'cosine', 'storage_file': '/home/amax/qcjySONG/newsql/Spider2_0406/spider2-snow/resource/databases/GITHUB_REPOS/GITHUB_REPOS_emb/vdb_relationships.json'} 42 data INFO:nano-vectordb:Load (43, 1024) data INFO:nano-vectordb:Init {'embedding_dim': 1024, 'metric': 'cosine', 'storage_file': '/home/amax/qcjySONG/newsql/Spider2_0406/spider2-snow/resource/databases/GITHUB_REPOS/GITHUB_REPOS_emb/vdb_chunks.json'} 43 data INFO: Process 7831 initialized updated flags for namespace: [full_docs] INFO: Process 7831 ready to initialize storage namespace: [full_docs] INFO: Process 7831 initialized updated flags for namespace: [text_chunks] INFO: Process 7831 ready to initialize storage namespace: [text_chunks] INFO: Process 7831 initialized updated flags for namespace: [entities] INFO: Process 7831 initialized updated flags for namespace: [relationships] INFO: Process 7831 initialized updated flags for namespace: [chunks] INFO: Process 7831 initialized updated flags for namespace: [chunk_entity_relation] INFO: Process 7831 initialized updated flags for namespace: [llm_response_cache] INFO: Process 7831 ready to initialize storage namespace: [llm_response_cache] INFO: Process 7831 initialized updated flags for namespace: [doc_status] INFO: Process 7831 ready to initialize storage namespace: [doc_status] No keywords found in query_param. Could default to global mode or fail. Sorry, I'm not able to provide an answer to that question.[no-context]

Expected Behavior

No response

LightRAG Config Used

LightRAG 1.3.1

Logs and screenshots

No response

Additional Information

LightRAG Version:1.3.1
Operating System:
Python Version:3.11
Related Issues:

The text was updated successfully, but these errors were encountered:

qcjySONG · 2025-04-15T13:48:12Z

I'm truly sorry. The report still contains errors. The version I used previously worked perfectly fine. After the update to the current 1.3.1 version, even the em which is similar to the OpenAI interface can't use this function anymore (but it's not the ChatGPT model), specifically the rag.query_with_separate_keyword_extraction() function.

qcjySONG added the bug Something isn't working label Apr 15, 2025

qcjySONG closed this as completed Apr 17, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bug]: (rag.query_with_separate_keyword_extraction)When I use word embeddings provided by SiliconFlow, there's a retrieval failure phenomenon, but this doesn't occur with the OpenAI interface. #1378

[Bug]: (rag.query_with_separate_keyword_extraction)When I use word embeddings provided by SiliconFlow, there's a retrieval failure phenomenon, but this doesn't occur with the OpenAI interface. #1378

qcjySONG commented Apr 15, 2025

qcjySONG commented Apr 15, 2025

Uh oh!

[Bug]: (rag.query_with_separate_keyword_extraction)When I use word embeddings provided by SiliconFlow, there's a retrieval failure phenomenon, but this doesn't occur with the OpenAI interface. #1378

[Bug]: (rag.query_with_separate_keyword_extraction)When I use word embeddings provided by SiliconFlow, there's a retrieval failure phenomenon, but this doesn't occur with the OpenAI interface. #1378

Comments

qcjySONG commented Apr 15, 2025

Do you need to file an issue?

Describe the bug

Steps to reproduce

Expected Behavior

LightRAG Config Used

Logs and screenshots

Additional Information

qcjySONG commented Apr 15, 2025

Uh oh!