Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[REQUEST] Custom file indexing: Unable to save vectors in collection Qdrant #680

Open
HeyBossy opened this issue Feb 27, 2025 · 0 comments
Labels
enhancement New feature or request

Comments

@HeyBossy
Copy link

HeyBossy commented Feb 27, 2025

Reference Issues

No response

Summary

I am trying to create a custom indexing pipeline where I first add a file to the index source and then index it into a specific vector store collection (index_1). However, despite successful file processing, no vectors are being saved in the expected collection.

Basic Example

I am using a custom pipeline based on the default IndexDocumentPipeline from ktem. My setup includes:

  • Inserting the file into the source table.
  • Processing the file (e.g., reading a PDF, splitting text into segments).
  • Generating embeddings using the Ollama model.
  • Saving the embeddings into a Qdrant vector store configured to use the collection index_1_index.

Example code snippet:

from pathlib import Path
from libs.ktem.ktem.index.file.index import FileIndex
from app import app
from qdrant_client import QdrantClient

# Configure vector store with the target collection name.
vector_store_config = {
    "type": "qdrant",
    "collection": "index_1",
    "params": {
        "host": "qdrant",
        "port": 6333,
        "prefer_grpc": False
    }
}

embedding_model = OpenAIEmbeddings.withx(
    model="nomic-embed-text",
    base_url="http://ollama:11434/v1",
    api_key="ollama"
)()

file_index = FileIndex(
    app=app,
    id=1,
    name="index_1",
    config={
        "embedding":embedding_model,
        "vector_store": vector_store_config,
        "reader_mode": "default",
        "chunk_size": 512,
        "chunk_overlap": 50
    }
)

file_index._setup_resources()
file_index._setup_indexing_cls()

file_path = "/workspace/notebook/my_pdf/sample.pdf"
indexing_pipeline = file_index.get_indexing_pipeline(settings={}, user_id="my_user_id")
indexing_pipeline.invoke([file_path])

In the Qdrant logs, I can see GET requests checking if the collection exists, but no PUT requests are logged. This suggests that the pipeline is not saving the generated vectors in the expected collection

Drawbacks

  • The current implementation does not save any vectors, which prevents retrieval functionality.
  • I cannot verify the custom indexing process if no data is saved in the target collection.
  • This issue blocks further development and integration of the custom pipeline.

Additional information

  • I have verified that the file is processed correctly and segments are generated.
  • The custom pipeline is registered via flowsettings.py with the proper path.
  • All configurations for the vector store, including the collection name, are provided as a dictionary.
  • Qdrant logs only show GET requests (e.g., checking for collection existence) and no PUT requests for saving vectors.
  • Any guidance on what might be missing in the custom pipeline or configuration would be greatly appreciated.
@HeyBossy HeyBossy added the enhancement New feature or request label Feb 27, 2025
@HeyBossy HeyBossy changed the title [REQUEST] Custom file indexing: Unable to save vectors in collection index_1_index [REQUEST] Custom file indexing: Unable to save vectors in collection Qdrant Feb 27, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant