Skip to content

[Bug]: (rag.query_with_separate_keyword_extraction)When I use word embeddings provided by SiliconFlow, there's a retrieval failure phenomenon, but this doesn't occur with the OpenAI interface. #1378

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
1 of 2 tasks
qcjySONG opened this issue Apr 15, 2025 · 1 comment
Labels
bug Something isn't working

Comments

@qcjySONG
Copy link

Do you need to file an issue?

  • I have searched the existing issues and this bug is not already filed.
  • I believe this is a legitimate bug, not just a question or feature request.

Describe the bug

When I use the rag.query() function, it returns normally. However, when I use rag.query_with_separate_keyword_extraction(), a retrieval failure occurs.

Steps to reproduce

from lightrag.llm.openai import openai_complete_if_cache,openai_embed
from lightrag.utils import EmbeddingFunc
from lightrag.llm.siliconcloud import siliconcloud_embedding
from lightrag import LightRAG, QueryParam
from lightrag.llm.hf import hf_embed
from lightrag.utils import EmbeddingFunc

#DeepSeek的API
LLM_MODEL = os.environ.get("LLM_MODEL", "deepseek-reasoner")
BASE_URL = os.environ.get("BASE_URL", "https://api.deepseek.com/v1")
API_KEY = os.environ.get("API_KEY", "sk-123456789")



async def embedding_func(texts: list[str]) -> np.ndarray:
    print(texts)
    return await siliconcloud_embedding(
        texts=texts,
        model='BAAI/bge-m3',
        base_url='https://api.siliconflow.cn/v1/embeddings',
        api_key='sk-123456789',
    )

async def get_embedding_dim():
    test_text = ["This is a test sentence."]
    embedding = await embedding_func(test_text)
    print(embedding)
    embedding_dim = embedding.shape[1]
    print(f"{embedding_dim=}")
    return embedding_dim    

# LLM model function
async def llm_model_func(
    prompt, system_prompt=None, history_messages=[], keyword_extraction=False, **kwargs
) -> str:
    return await openai_complete_if_cache(
        model=LLM_MODEL,
        prompt=prompt,
        system_prompt=system_prompt,
        history_messages=history_messages,
        base_url=BASE_URL,
        api_key=API_KEY,
        **kwargs,
    )
 
working_dir='/home/amax/qcjySONG/newsql/Spider2_0406/spider2-snow/resource/databases/GITHUB_REPOS/GITHUB_REPOS_emb'

pro=“my Pro”

rag = LightRAG(
    working_dir=working_dir,
    llm_model_func=llm_model_func,
    llm_model_max_token_size=65536,
    embedding_func=EmbeddingFunc(
        embedding_dim=1024,
        max_token_size=8192,
        func=embedding_func
    ),
    enable_llm_cache=False
)

# temp=rag.query(
#         pro,
#         param=QueryParam(mode='global',only_need_context=True)
#     )

# print(temp)

#"local","global","global","hybrid"

temp=rag.query_with_separate_keyword_extraction(
        query= "How ……,
        prompt=pro,
        param=QueryParam(mode="local",only_need_context=True)
    )
print(temp)

return:
INFO: Process 7831 Shared-Data created for Single Process INFO:nano-vectordb:Load (37, 1024) data INFO:nano-vectordb:Init {'embedding_dim': 1024, 'metric': 'cosine', 'storage_file': '/home/amax/qcjySONG/newsql/Spider2_0406/spider2-snow/resource/databases/GITHUB_REPOS/GITHUB_REPOS_emb/vdb_entities.json'} 37 data INFO:nano-vectordb:Load (42, 1024) data INFO:nano-vectordb:Init {'embedding_dim': 1024, 'metric': 'cosine', 'storage_file': '/home/amax/qcjySONG/newsql/Spider2_0406/spider2-snow/resource/databases/GITHUB_REPOS/GITHUB_REPOS_emb/vdb_relationships.json'} 42 data INFO:nano-vectordb:Load (43, 1024) data INFO:nano-vectordb:Init {'embedding_dim': 1024, 'metric': 'cosine', 'storage_file': '/home/amax/qcjySONG/newsql/Spider2_0406/spider2-snow/resource/databases/GITHUB_REPOS/GITHUB_REPOS_emb/vdb_chunks.json'} 43 data INFO: Process 7831 initialized updated flags for namespace: [full_docs] INFO: Process 7831 ready to initialize storage namespace: [full_docs] INFO: Process 7831 initialized updated flags for namespace: [text_chunks] INFO: Process 7831 ready to initialize storage namespace: [text_chunks] INFO: Process 7831 initialized updated flags for namespace: [entities] INFO: Process 7831 initialized updated flags for namespace: [relationships] INFO: Process 7831 initialized updated flags for namespace: [chunks] INFO: Process 7831 initialized updated flags for namespace: [chunk_entity_relation] INFO: Process 7831 initialized updated flags for namespace: [llm_response_cache] INFO: Process 7831 ready to initialize storage namespace: [llm_response_cache] INFO: Process 7831 initialized updated flags for namespace: [doc_status] INFO: Process 7831 ready to initialize storage namespace: [doc_status] No keywords found in query_param. Could default to global mode or fail. Sorry, I'm not able to provide an answer to that question.[no-context]

Expected Behavior

No response

LightRAG Config Used

LightRAG 1.3.1

Logs and screenshots

No response

Additional Information

  • LightRAG Version:1.3.1
  • Operating System:
  • Python Version:3.11
  • Related Issues:
@qcjySONG qcjySONG added the bug Something isn't working label Apr 15, 2025
@qcjySONG
Copy link
Author

I'm truly sorry. The report still contains errors. The version I used previously worked perfectly fine. After the update to the current 1.3.1 version, even the em which is similar to the OpenAI interface can't use this function anymore (but it's not the ChatGPT model), specifically the rag.query_with_separate_keyword_extraction() function.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant