Skip to content

VDMS langchain package update #1317

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 35 commits into from
Apr 2, 2025
Merged
Show file tree
Hide file tree
Changes from 7 commits
Commits
Show all changes
35 commits
Select commit Hold shift + click to select a range
e90c52a
Update VDMS related components
cwlacewe Mar 24, 2025
d7f4af6
Update VDMS related components
cwlacewe Mar 24, 2025
6cd8d10
Merge branch 'update_vdms' of https://github.com/cwlacewe/GenAIComps …
cwlacewe Mar 25, 2025
f5ebbc6
Pinned protobuf==4.24.2 for vdms to avoid issues
cwlacewe Mar 25, 2025
ab33713
Update VDMS related components
cwlacewe Mar 24, 2025
c4d01c5
Pinned protobuf==4.24.2 for vdms to avoid issues
cwlacewe Mar 25, 2025
aa2841b
Merge branch 'update_vdms' of https://github.com/cwlacewe/GenAIComps …
cwlacewe Mar 25, 2025
d208dcc
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Mar 25, 2025
eedf19b
Update VDMS related components
cwlacewe Mar 24, 2025
4a7b2cc
Merge branch 'update_vdms' of https://github.com/cwlacewe/GenAIComps …
cwlacewe Mar 25, 2025
ac11f7e
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Mar 25, 2025
b8198e5
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Mar 25, 2025
12900b7
pin protobuf with installation of opentelemetry
cwlacewe Mar 25, 2025
faa18b3
Merge branch 'update_vdms' of https://github.com/cwlacewe/GenAIComps …
cwlacewe Mar 25, 2025
57cedc4
pin protobuf and downgrade opentelemetry to v1.27.0 in dataprep and r…
cwlacewe Mar 25, 2025
845401d
Move opentelemetry installation to requirements file like others, uns…
cwlacewe Mar 26, 2025
29d52d2
Merge branch 'main' into update_vdms
xiguiw Mar 26, 2025
145558d
Merge branch 'main' into update_vdms
cwlacewe Mar 26, 2025
b89964d
Update VDMS related components
cwlacewe Mar 24, 2025
6c2746e
Pinned protobuf==4.24.2 for vdms to avoid issues
cwlacewe Mar 25, 2025
625a28a
Update VDMS related components
cwlacewe Mar 24, 2025
21ae0c4
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Mar 25, 2025
2bb28b0
pin protobuf with installation of opentelemetry
cwlacewe Mar 25, 2025
c88968f
pin protobuf and downgrade opentelemetry to v1.27.0 in dataprep and r…
cwlacewe Mar 25, 2025
19a0a0a
Move opentelemetry installation to requirements file like others, uns…
cwlacewe Mar 26, 2025
49bf4e2
Merge branch 'update_vdms' of https://github.com/cwlacewe/GenAIComps …
cwlacewe Mar 26, 2025
e7382f1
Merge branch 'main' into update_vdms
xiguiw Mar 27, 2025
1da89e0
Merge branch 'main' into update_vdms
xiguiw Mar 27, 2025
074bfd8
Merge branch 'main' into update_vdms
xiguiw Mar 27, 2025
7a081e2
Merge branch 'main' into update_vdms
cwlacewe Mar 31, 2025
1e25d57
Merge branch 'main' into update_vdms
xiguiw Apr 1, 2025
5a59760
Merge branch 'main' into update_vdms
chensuyue Apr 2, 2025
61a354f
Merge branch 'main' into update_vdms
xiguiw Apr 2, 2025
3f6357e
Merge branch 'main' into update_vdms
cwlacewe Apr 2, 2025
934875a
Merge branch 'main' into update_vdms
xiguiw Apr 2, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions .github/workflows/pr-microservice-test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -64,6 +64,10 @@ jobs:
role-to-assume: ${{ secrets.AWS_IAM_ROLE_ARN }}
aws-region: us-east-1

- name: Set Memory Map Limit
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change will break CI test, the code change in GHA workflow won't work until they merged, because we use "pull_request_target".

if: ${{ contains(matrix.service, "opensearch") }}
run: sudo sysctl -w vm.max_map_count=262144

- name: Run microservice test
env:
HF_TOKEN: ${{ secrets.HF_TOKEN }}
Expand Down
3 changes: 1 addition & 2 deletions comps/dataprep/src/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -39,8 +39,7 @@ RUN pip install --no-cache-dir --upgrade pip setuptools && \
PIP_EXTRA_INDEX_URL=""; \
fi && \
pip install --no-cache-dir torch torchvision ${PIP_EXTRA_INDEX_URL} && \
pip install --no-cache-dir ${PIP_EXTRA_INDEX_URL} -r /home/user/comps/dataprep/src/requirements.txt && \
pip install opentelemetry-api==1.29.0 opentelemetry-exporter-otlp==1.29.0 opentelemetry-sdk==1.29.0
pip install --no-cache-dir ${PIP_EXTRA_INDEX_URL} -r /home/user/comps/dataprep/src/requirements.txt

ENV PYTHONPATH=$PYTHONPATH:/home/user

Expand Down
15 changes: 9 additions & 6 deletions comps/dataprep/src/integrations/utils/store_embeddings.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,10 +6,9 @@
import numpy as np
import torchvision.transforms as T
from decord import VideoReader, cpu
from langchain.pydantic_v1 import BaseModel, root_validator
from langchain_community.vectorstores import VDMS
from langchain_community.vectorstores.vdms import VDMS_Client
from langchain_core.embeddings import Embeddings
from langchain_vdms.vectorstores import VDMS, VDMS_Client
from pydantic import BaseModel, model_validator

toPIL = T.ToPILImage()

Expand All @@ -21,7 +20,7 @@ class vCLIPEmbeddings(BaseModel, Embeddings):

model: Any

@root_validator(allow_reuse=True)
@model_validator(mode="before")
def validate_environment(cls, values: Dict) -> Dict:
"""Validate that open_clip and torch libraries are installed."""
try:
Expand Down Expand Up @@ -99,6 +98,8 @@ def __init__(
collection_name,
embedding_dimensions: int = 512,
chosen_video_search_type="similarity",
engine: str = "FaissFlat",
distance_strategy: str = "IP",
):

self.host = host
Expand All @@ -110,6 +111,8 @@ def __init__(
self.video_embedder = vCLIPEmbeddings(model=video_retriever_model)
self.chosen_video_search_type = chosen_video_search_type
self.embedding_dimensions = embedding_dimensions
self.engine = engine
self.distance_strategy = distance_strategy

# initialize_db
self.get_db_client()
Expand All @@ -128,7 +131,7 @@ def init_db(self):
client=self.client,
embedding=self.video_embedder,
collection_name=self.video_collection,
engine="FaissFlat",
distance_strategy="IP",
engine=self.engine,
distance_strategy=self.distance_strategy,
embedding_dimensions=self.embedding_dimensions,
)
2 changes: 1 addition & 1 deletion comps/dataprep/src/integrations/vdms.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,9 +8,9 @@
from fastapi import Body, File, Form, HTTPException, UploadFile
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.embeddings import HuggingFaceBgeEmbeddings, HuggingFaceInferenceAPIEmbeddings
from langchain_community.vectorstores.vdms import VDMS, VDMS_Client
from langchain_huggingface import HuggingFaceEmbeddings
from langchain_text_splitters import HTMLHeaderTextSplitter
from langchain_vdms.vectorstores import VDMS, VDMS_Client

from comps import CustomLogger, DocPath, OpeaComponent, OpeaComponentRegistry, ServiceType
from comps.dataprep.src.utils import (
Expand Down
14 changes: 12 additions & 2 deletions comps/dataprep/src/integrations/vdms_multimodal.py
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,8 @@
VECTORDB_SERVICE_HOST_IP = os.getenv("VDMS_HOST", "0.0.0.0")
VECTORDB_SERVICE_PORT = os.getenv("VDMS_PORT", 55555)
collection_name = os.getenv("INDEX_NAME", "rag-vdms")
SEARCH_ENGINE = os.getenv("SEARCH_ENGINE", "FaissFlat")
DISTANCE_STRATEGY = os.getenv("DISTANCE_STRATEGY", "IP")

logger = CustomLogger("opea_dataprep_vdms_multimodal")
logflag = os.getenv("LOGFLAG", False)
Expand Down Expand Up @@ -72,6 +74,7 @@ def store_into_vectordb(self, vs, metadata_file_path, dimensions):
metadata_list = [data]
if vs.selected_db == "vdms":
vs.video_db.add_videos(
texts=video_name_list,
paths=video_name_list,
metadatas=metadata_list,
start_time=[data["timestamp"]],
Expand Down Expand Up @@ -145,14 +148,21 @@ async def ingest_videos(self, files: List[UploadFile] = File(None)):
# init meanclip model
model = self.setup_vclip_model(meanclip_cfg, device="cpu")
vs = store_embeddings.VideoVS(
host, port, selected_db, model, collection_name, embedding_dimensions=vector_dimensions
host,
port,
selected_db,
model,
collection_name,
embedding_dimensions=vector_dimensions,
engine=SEARCH_ENGINE,
distance_strategy=DISTANCE_STRATEGY,
)
logger.info("done creating DB, sleep 5s")
await asyncio.sleep(5)

self.generate_embeddings(config, vector_dimensions, vs)

return {"message": "Videos ingested successfully"}
return {"status": 200, "message": "Videos ingested successfully"}

async def get_videos(self):
"""Returns list of names of uploaded videos saved on the server."""
Expand Down
10 changes: 7 additions & 3 deletions comps/dataprep/src/requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ einops
elasticsearch
fastapi
future
graspologic
graspologic
html2text
huggingface_hub
ipython
Expand All @@ -21,9 +21,10 @@ langchain-openai
langchain-pinecone
langchain-redis
langchain-text-splitters
langchain-vdms>=0.1.4
langchain_huggingface
langchain_milvus
llama-index
llama-index
llama-index-core==0.12.19
llama-index-embeddings-text-embeddings-inference
llama-index-graph-stores-neo4j
Expand All @@ -37,11 +38,15 @@ openai
openai-whisper
opencv-python
opensearch-py
opentelemetry-api==1.27.0
opentelemetry-exporter-otlp==1.27.0
opentelemetry-sdk==1.27.0
pandas
pgvector==0.2.5
Pillow
pinecone-client
prometheus-fastapi-instrumentator
protobuf==4.24.2
psycopg2
pymupdf
pyspark
Expand All @@ -60,5 +65,4 @@ typing
tzlocal
unstructured[all-docs]
uvicorn
vdms
webvtt-py
12 changes: 12 additions & 0 deletions comps/retrievers/deployment/docker_compose/compose.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -179,6 +179,18 @@ services:
tei-embedding-serving:
condition: service_healthy

retriever-vdms-multimodal:
extends: retriever
container_name: retriever-vdms-multimodal
environment:
RETRIEVER_COMPONENT_NAME: "OPEA_RETRIEVER_VDMS"
VDMS_INDEX_NAME: ${INDEX_NAME}
VDMS_HOST: ${host_ip}
VDMS_PORT: ${VDMS_PORT}
VDMS_USE_CLIP: ${VDMS_USE_CLIP}
depends_on:
vdms-vector-db:
condition: service_healthy

networks:
default:
Expand Down
3 changes: 1 addition & 2 deletions comps/retrievers/src/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -25,8 +25,7 @@ RUN pip install --no-cache-dir --upgrade pip setuptools && \
PIP_EXTRA_INDEX_URL=""; \
fi && \
pip install --no-cache-dir torch torchvision ${PIP_EXTRA_INDEX_URL} && \
pip install --no-cache-dir ${PIP_EXTRA_INDEX_URL} -r /home/user/comps/retrievers/src/requirements.txt && \
pip install opentelemetry-api==1.29.0 opentelemetry-exporter-otlp==1.29.0 opentelemetry-sdk==1.29.0
pip install --no-cache-dir ${PIP_EXTRA_INDEX_URL} -r /home/user/comps/retrievers/src/requirements.txt

ENV PYTHONPATH=$PYTHONPATH:/home/user

Expand Down
4 changes: 2 additions & 2 deletions comps/retrievers/src/integrations/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -184,5 +184,5 @@ def format_opensearch_conn_from_env():
VDMS_PORT = int(os.getenv("VDMS_PORT", 55555))
VDMS_INDEX_NAME = os.getenv("VDMS_INDEX_NAME", "rag_vdms")
VDMS_USE_CLIP = int(os.getenv("VDMS_USE_CLIP", 0))
SEARCH_ENGINE = "FaissFlat"
DISTANCE_STRATEGY = "IP"
SEARCH_ENGINE = os.getenv("SEARCH_ENGINE", "FaissFlat")
DISTANCE_STRATEGY = os.getenv("DISTANCE_STRATEGY", "IP")
Loading