From 152ead035b8d9258802d24151e2527e9caf53c5c Mon Sep 17 00:00:00 2001 From: Artem Astafev Date: Tue, 1 Apr 2025 17:58:02 +0700 Subject: [PATCH 01/25] Init changes for MultimodalQnA for vLLM Signed-off-by: Artem Astafev --- .../docker_compose/amd/gpu/rocm/README.md | 477 +++++++++++------- .../amd/gpu/rocm/compose_vllm.yaml | 186 +++++++ .../amd/gpu/rocm/set_env_vllm.sh | 33 ++ MultimodalQnA/docker_image_build/build.yaml | 5 + .../tests/test_compose_vllm_on_rocm.sh | 331 ++++++++++++ 5 files changed, 847 insertions(+), 185 deletions(-) create mode 100644 MultimodalQnA/docker_compose/amd/gpu/rocm/compose_vllm.yaml create mode 100644 MultimodalQnA/docker_compose/amd/gpu/rocm/set_env_vllm.sh create mode 100644 MultimodalQnA/tests/test_compose_vllm_on_rocm.sh diff --git a/MultimodalQnA/docker_compose/amd/gpu/rocm/README.md b/MultimodalQnA/docker_compose/amd/gpu/rocm/README.md index 4e3a031da9..fc4e95b1d2 100644 --- a/MultimodalQnA/docker_compose/amd/gpu/rocm/README.md +++ b/MultimodalQnA/docker_compose/amd/gpu/rocm/README.md @@ -1,4 +1,4 @@ -# Build Mega Service of MultimodalQnA for AMD ROCm +# Build and Deploy MultimodalQnA Application on AMD GPU (ROCm) This document outlines the deployment process for a MultimodalQnA application utilizing the [GenAIComps](https://github.com/opea-project/GenAIComps.git) microservice pipeline on AMD server with ROCm GPUs. The steps include Docker image creation, container deployment via Docker Compose, and service execution to integrate microservices such as `multimodal_embedding` that employs [BridgeTower](https://huggingface.co/BridgeTower/bridgetower-large-itm-mlm-gaudi) model as embedding model, `multimodal_retriever`, `lvm`, and `multimodal-data-prep`. We will publish the Docker images to Docker Hub soon, it will simplify the deployment process for this service. @@ -6,95 +6,216 @@ For detailed information about these instance types, you can refer to this [link After launching your instance, you can connect to it using SSH (for Linux instances) or Remote Desktop Protocol (RDP) (for Windows instances). From there, you'll have full access to your Xeon server, allowing you to install, configure, and manage your applications as needed. -## Setup Environment Variables +## Build Docker Images -Since the `compose.yaml` will consume some environment variables, you need to setup them in advance as below. +### 1. Build Docker Image -Please use `./set_env.sh` (. set_env.sh) script to set up all needed Environment Variables. +- #### Create application install directory and go to it: -**Export the value of the public IP address of your server to the `host_ip` environment variable** + ```bash + mkdir ~/multimodalqna-install && cd multimodalqna-install + ``` -Note: Please replace with `host_ip` with you external IP address, do not use localhost. +- #### Clone the repository GenAIExamples (the default repository branch "main" is used here): -## 🚀 Build Docker Images + ```bash + git clone https://github.com/opea-project/GenAIExamples.git + ``` -### 1. Build embedding-multimodal-bridgetower Image + If you need to use a specific branch/tag of the GenAIExamples repository, then (v1.3 replace with its own value): -Build embedding-multimodal-bridgetower docker image + ```bash + git clone https://github.com/opea-project/GenAIExamples.git && cd GenAIExamples && git checkout v1.3 + ``` -```bash -git clone https://github.com/opea-project/GenAIComps.git -cd GenAIComps -docker build --no-cache -t opea/embedding-multimodal-bridgetower:latest --build-arg EMBEDDER_PORT=$EMBEDDER_PORT --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/third_parties/bridgetower/src/Dockerfile . -``` + We remind you that when using a specific version of the code, you need to use the README from this version: -Build embedding microservice image +- #### Go to build directory: -```bash -docker build --no-cache -t opea/embedding:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/embeddings/src/Dockerfile . -``` + ```bash + cd ~/multimodalqna-install/GenAIExamples/MultimodalQnA/docker_image_build + ``` -### 2. Build LVM Images +- Cleaning up the GenAIComps repository if it was previously cloned in this directory. + This is necessary if the build was performed earlier and the GenAIComps folder exists and is not empty: -Build lvm-llava image + ```bash + echo Y | rm -R GenAIComps + ``` -```bash -docker build --no-cache -t opea/lvm-llava:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/lvms/src/integrations/dependency/llava/Dockerfile . +- #### Clone the repository GenAIComps (the default repository branch "main" is used here): + + ```bash + git clone https://github.com/opea-project/GenAIComps.git + ``` + + If you use a specific tag of the GenAIExamples repository, + then you should also use the corresponding tag for GenAIComps. (v1.3 replace with its own value): + + ```bash + git clone https://github.com/opea-project/GenAIComps.git && cd GenAIComps && git checkout v1.3 + ``` + + We remind you that when using a specific version of the code, you need to use the README from this version. + +- #### Setting the list of images for the build (from the build file.yaml) + + If you want to deploy a vLLM-based or TGI-based application, then the set of services is installed as follows: + + #### vLLM-based application + + ```bash + service_list="multimodalqna multimodalqna-ui embedding-multimodal-bridgetower embedding retriever lvm dataprep whisper vllm-rocm" + ``` + + #### TGI-based application + + ```bash + service_list="multimodalqna multimodalqna-ui embedding-multimodal-bridgetower embedding retriever lvm dataprep whisper" + ``` + +- #### Optional. Pull TGI Docker Image (Do this if you want to use TGI) + + ```bash + docker pull ghcr.io/huggingface/text-generation-inference:2.3.1-rocm + ``` + +- #### Build Docker Images + + ```bash + docker compose -f build.yaml build ${service_list} --no-cache + ``` + + After the build, we check the list of images with the command: + + ```bash + docker image ls + ``` + + The list of images should include: + + ##### vLLM-based application: + + - opea/vllm-rocm:latest + - opea/lvm:latest + - opea/multimodalqna:latest + - opea/multimodalqna-ui:latest + - opea/dataprep:latest + - opea/embedding:latest + - opea/embedding-multimodal-bridgetower:latest + - opea/retriever:latest + - opea/whisper:latest + + ##### TGI-based application: + + - ghcr.io/huggingface/text-generation-inference:2.4.1-rocm + - opea/lvm:latest + - opea/multimodalqna:latest + - opea/multimodalqna-ui:latest + - opea/dataprep:latest + - opea/embedding:latest + - opea/embedding-multimodal-bridgetower:latest + - opea/retriever:latest + - opea/whisper:latest +--- +## Deploy the MultimodalQnA Application + +### Docker Compose Configuration for AMD GPUs + +To enable GPU support for AMD GPUs, the following configuration is added to the Docker Compose file: + +- compose_vllm.yaml - for vLLM-based application +- compose.yaml - for TGI-based + +```yaml +shm_size: 1g +devices: + - /dev/kfd:/dev/kfd + - /dev/dri/:/dev/dri/ +cap_add: + - SYS_PTRACE +group_add: + - video +security_opt: + - seccomp:unconfined +``` + +This configuration forwards all available GPUs to the container. To use a specific GPU, specify its `cardN` and `renderN` device IDs. For example: + +```yaml +shm_size: 1g +devices: + - /dev/kfd:/dev/kfd + - /dev/dri/card0:/dev/dri/card0 + - /dev/dri/renderD128:/dev/dri/renderD128 +cap_add: + - SYS_PTRACE +group_add: + - video +security_opt: + - seccomp:unconfined ``` -### 3. Build retriever-multimodal-redis Image +**How to Identify GPU Device IDs:** +Use AMD GPU driver utilities to determine the correct `cardN` and `renderN` IDs for your GPU. + +### Set deploy environment variables + +#### Setting variables in the operating system environment: + +##### Set variable HUGGINGFACEHUB_API_TOKEN: ```bash -docker build --no-cache -t opea/retriever:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/retrievers/src/Dockerfile . +### Replace the string 'your_huggingfacehub_token' with your HuggingFacehub repository access token. +export HUGGINGFACEHUB_API_TOKEN='your_huggingfacehub_token' ``` -### 4. Build dataprep-multimodal-redis Image +#### Set variables value in set_env\*\*\*\*.sh file: + +Go to Docker Compose directory: ```bash -docker build --no-cache -t opea/dataprep:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/dataprep/src/Dockerfile . +cd ~/multimodalqna-install/GenAIExamples/MultimodalQnA/docker_compose/amd/gpu/rocm ``` -### 5. Build MegaService Docker Image +The example uses the Nano text editor. You can use any convenient text editor: -To construct the Mega Service, we utilize the [GenAIComps](https://github.com/opea-project/GenAIComps.git) microservice pipeline within the [multimodalqna.py](../../../../multimodalqna.py) Python script. Build MegaService Docker image via below command: +#### If you use vLLM ```bash -git clone https://github.com/opea-project/GenAIExamples.git -cd GenAIExamples/MultimodalQnA -docker build --no-cache -t opea/multimodalqna:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f Dockerfile . -cd ../.. +nano set_env_vllm.sh ``` -### 6. Build UI Docker Image - -Build frontend Docker image via below command: +#### If you use TGI ```bash -cd GenAIExamples/MultimodalQnA/ui/ -docker build --no-cache -t opea/multimodalqna-ui:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f ./docker/Dockerfile . -cd ../../../ +nano set_env.sh ``` -### 7. Pull TGI AMD ROCm Image +If you are in a proxy environment, also set the proxy-related environment variables: ```bash -docker pull ghcr.io/huggingface/text-generation-inference:2.4.1-rocm +export http_proxy="Your_HTTP_Proxy" +export https_proxy="Your_HTTPs_Proxy" ``` -Then run the command `docker images`, you will have the following 8 Docker Images: +Set the values of the variables: -1. `opea/dataprep:latest` -2. `ghcr.io/huggingface/text-generation-inference:2.4.1-rocm` -3. `opea/lvm:latest` -4. `opea/retriever:latest` -5. `opea/embedding:latest` -6. `opea/embedding-multimodal-bridgetower:latest` -7. `opea/multimodalqna:latest` -8. `opea/multimodalqna-ui:latest` +- **HOST_IP, HOST_IP_EXTERNAL** - These variables are used to configure the name/address of the service in the operating system environment for the application services to interact with each other and with the outside world. -## 🚀 Start Microservices + If your server uses only an internal address and is not accessible from the Internet, then the values for these two variables will be the same and the value will be equal to the server's internal name/address. -### Required Models + If your server uses only an external, Internet-accessible address, then the values for these two variables will be the same and the value will be equal to the server's external name/address. + + If your server is located on an internal network, has an internal address, but is accessible from the Internet via a proxy/firewall/load balancer, then the HOST_IP variable will have a value equal to the internal name/address of the server, and the EXTERNAL_HOST_IP variable will have a value equal to the external name/address of the proxy/firewall/load balancer behind which the server is located. + + We set these values in the file set_env\*\*\*\*.sh + +- **Variables with names like "**\*\*\*\*\*\*\_PORT"\*\* - These variables set the IP port numbers for establishing network connections to the application services. + The values shown in the file set_env.sh or set_env_vllm they are the values used for the development and testing of the application, as well as configured for the environment in which the development is performed. These values must be configured in accordance with the rules of network access to your environment's server, and must not overlap with the IP ports of other applications that are already in use. + + +#### Required Models By default, the multimodal-embedding and LVM models are set to a default value as listed below: @@ -108,202 +229,188 @@ Note: For AMD ROCm System "Xkev/Llama-3.2V-11B-cot" is recommended to run on ghcr.io/huggingface/text-generation-inference:2.4.1-rocm -### Start all the services Docker Containers +#### Set variables with script set_env\*\*\*\*.sh -> Before running the docker compose command, you need to be in the folder that has the docker compose yaml file +#### If you use vLLM ```bash -cd GenAIExamples/MultimodalQnA/docker_compose/amd/gpu/rocm -. set_env.sh -docker compose -f compose.yaml up -d +. set_env_vllm.sh ``` -Note: Please replace with `host_ip` with your external IP address, do not use localhost. +#### If you use TGI + +```bash +. set_env.sh +``` -Note: In order to limit access to a subset of GPUs, please pass each device individually using one or more -device /dev/dri/rendered, where is the card index, starting from 128. (https://rocm.docs.amd.com/projects/install-on-linux/en/latest/how-to/docker.html#docker-restrict-gpus) +### Start the services: -Example for set isolation for 1 GPU +#### If you use vLLM -``` - - /dev/dri/card0:/dev/dri/card0 - - /dev/dri/renderD128:/dev/dri/renderD128 +```bash +docker compose -f compose_vllm.yaml up -d ``` -Example for set isolation for 2 GPUs +#### If you use TGI -``` - - /dev/dri/card0:/dev/dri/card0 - - /dev/dri/renderD128:/dev/dri/renderD128 - - /dev/dri/card1:/dev/dri/card1 - - /dev/dri/renderD129:/dev/dri/renderD129 +```bash +docker compose -f compose.yaml up -d ``` -Please find more information about accessing and restricting AMD GPUs in the link (https://rocm.docs.amd.com/projects/install-on-linux/en/latest/how-to/docker.html#docker-restrict-gpus) +All containers should be running and should not restart: -### Validate Microservices +##### If you use vLLM: -1. embedding-multimodal-bridgetower +- multimodalqna-vllm-service +- multimodalqna-lvm +- multimodalqna-backend-server +- multimodalqna-gradio-ui-server +- whisper-service +- embedding-multimodal-bridgetower +- redis-vector-db +- embedding +- retriever-redis +- dataprep-multimodal-redis -```bash -curl http://${host_ip}:${EMBEDDER_PORT}/v1/encode \ - -X POST \ - -H "Content-Type:application/json" \ - -d '{"text":"This is example"}' -``` +##### If you use TGI: -```bash -curl http://${host_ip}:${EMBEDDER_PORT}/v1/encode \ - -X POST \ - -H "Content-Type:application/json" \ - -d '{"text":"This is example", "img_b64_str": "iVBORw0KGgoAAAANSUhEUgAAAAoAAAAKCAYAAACNMs+9AAAAFUlEQVR42mP8/5+hnoEIwDiqkL4KAcT9GO0U4BxoAAAAAElFTkSuQmCC"}' -``` +- tgi-llava-rocm-server +- multimodalqna-lvm +- multimodalqna-backend-server +- multimodalqna-gradio-ui-server +- whisper-service +- embedding-multimodal-bridgetower +- redis-vector-db +- embedding +- retriever-redis +- dataprep-multimodal-redis -2. embedding +--- +## Validate the Services -```bash -curl http://${host_ip}:$MM_EMBEDDING_PORT_MICROSERVICE/v1/embeddings \ - -X POST \ - -H "Content-Type: application/json" \ - -d '{"text" : "This is some sample text."}' -``` +### 1. Validate the vLLM/TGI Service + +#### If you use vLLM: ```bash -curl http://${host_ip}:$MM_EMBEDDING_PORT_MICROSERVICE/v1/embeddings \ - -X POST \ - -H "Content-Type: application/json" \ - -d '{"text": {"text" : "This is some sample text."}, "image" : {"url": "https://github.com/docarray/docarray/blob/main/tests/toydata/image-data/apple.png?raw=true"}}' +DATA='{"model": "Xkev/Llama-3.2V-11B-cot", '\ +'"messages": [{"role": "user", "content": ""}], "max_tokens": 256}' + +curl http://${HOST_IP}:${MULTIMODALQNA_VLLM_SERVICE_PORT}/v1/chat/completions \ + -X POST \ + -d "$DATA" \ + -H 'Content-Type: application/json' ``` -3. retriever-multimodal-redis +Checking the response from the service. The response should be similar to JSON: -```bash -export your_embedding=$(python3 -c "import random; embedding = [random.uniform(-1, 1) for _ in range(512)]; print(embedding)") -curl http://${host_ip}:7000/v1/retrieval \ - -X POST \ - -H "Content-Type: application/json" \ - -d "{\"text\":\"test\",\"embedding\":${your_embedding}}" -``` +````json -4. lvm-llava +```` -```bash -curl http://${host_ip}:${LLAVA_SERVER_PORT}/generate \ - -X POST \ - -H "Content-Type:application/json" \ - -d '{"prompt":"Describe the image please.", "img_b64_str": "iVBORw0KGgoAAAANSUhEUgAAAAoAAAAKCAYAAACNMs+9AAAAFUlEQVR42mP8/5+hnoEIwDiqkL4KAcT9GO0U4BxoAAAAAElFTkSuQmCC"}' -``` +If the service response has a meaningful response in the value of the "choices.message.content" key, +then we consider the vLLM service to be successfully launched -5. lvm +#### If you use TGI: ```bash -curl http://${host_ip}:9399/v1/lvm \ - -X POST \ - -H 'Content-Type: application/json' \ - -d '{"retrieved_docs": [], "initial_query": "What is this?", "top_n": 1, "metadata": [{"b64_img_str": "iVBORw0KGgoAAAANSUhEUgAAAAoAAAAKCAYAAACNMs+9AAAAFUlEQVR42mP8/5+hnoEIwDiqkL4KAcT9GO0U4BxoAAAAAElFTkSuQmCC", "transcript_for_inference": "yellow image", "video_id": "8c7461df-b373-4a00-8696-9a2234359fe0", "time_of_frame_ms":"37000000", "source_video":"WeAreGoingOnBullrun_8c7461df-b373-4a00-8696-9a2234359fe0.mp4"}], "chat_template":"The caption of the image is: '\''{context}'\''. {question}"}' -``` +DATA='{"inputs":"",'\ +'"parameters":{"max_new_tokens":256,"do_sample": true}}' -```bash -curl http://${host_ip}:9399/v1/lvm \ - -X POST \ - -H 'Content-Type: application/json' \ - -d '{"image": "iVBORw0KGgoAAAANSUhEUgAAAAoAAAAKCAYAAACNMs+9AAAAFUlEQVR42mP8/5+hnoEIwDiqkL4KAcT9GO0U4BxoAAAAAElFTkSuQmCC", "prompt":"What is this?"}' +curl http://${HOST_IP}:${MULTIMODALQNA_TGI_SERVICE_PORT}/generate \ + -X POST \ + -d "$DATA" \ + -H 'Content-Type: application/json' ``` -Also, validate LVM Microservice with empty retrieval results +Checking the response from the service. The response should be similar to JSON: -```bash -curl http://${host_ip}:9399/v1/lvm \ - -X POST \ - -H 'Content-Type: application/json' \ - -d '{"retrieved_docs": [], "initial_query": "What is this?", "top_n": 1, "metadata": [], "chat_template":"The caption of the image is: '\''{context}'\''. {question}"}' -``` +````json -6. dataprep-multimodal-redis +```` -Download a sample video, image, and audio file and create a caption +If the service response has a meaningful response in the value of the "generated_text" key, +then we consider the TGI service to be successfully launched + +### 2. Validate the LLM Service ```bash -export video_fn="WeAreGoingOnBullrun.mp4" -wget http://commondatastorage.googleapis.com/gtv-videos-bucket/sample/WeAreGoingOnBullrun.mp4 -O ${video_fn} +DATA='{"query":"",'\ +'"max_tokens":256,"top_k":10,"top_p":0.95,"typical_p":0.95,"temperature":0.01,'\ +'"repetition_penalty":1.03,"stream":false}' + +curl http://${HOST_IP}:${MULTIMODALQNA_LLM_SERVICE_PORT}/v1/chat/completions \ + -X POST \ + -d "$DATA" \ + -H 'Content-Type: application/json' +``` -export image_fn="apple.png" -wget https://github.com/docarray/docarray/blob/main/tests/toydata/image-data/apple.png?raw=true -O ${image_fn} +Checking the response from the service. The response should be similar to JSON: -export caption_fn="apple.txt" -echo "This is an apple." > ${caption_fn} +````json -export audio_fn="AudioSample.wav" -wget https://github.com/intel/intel-extension-for-transformers/raw/main/intel_extension_for_transformers/neural_chat/assets/audio/sample.wav -O ${audio_fn} -``` +```` -Test dataprep microservice with generating transcript. This command updates a knowledge base by uploading a local video .mp4 and an audio .wav file. +If the service response has a meaningful response in the value of the "choices.text" key, +then we consider the vLLM service to be successfully launched + +### 3. Validate the MegaService ```bash -curl --silent --write-out "HTTPSTATUS:%{http_code}" \ - ${DATAPREP_GEN_TRANSCRIPT_SERVICE_ENDPOINT} \ - -H 'Content-Type: multipart/form-data' \ - -X POST \ - -F "files=@./${video_fn}" \ - -F "files=@./${audio_fn}" +DATA='{"messages": "Implement a high-level API for a TODO list application. '\ +'The API takes as input an operation request and updates the TODO list in place. '\ +'If the request is invalid, raise an exception."}' + +curl http://${HOST_IP}:${MULTIMODALQNA_BACKEND_SERVICE_PORT}/v1/multimodalqna \ + -H "Content-Type: application/json" \ + -d "$DATA" ``` -Also, test dataprep microservice with generating an image caption using lvm microservice +Checking the response from the service. The response should be similar to text: -```bash -curl --silent --write-out "HTTPSTATUS:%{http_code}" \ - ${DATAPREP_GEN_CAPTION_SERVICE_ENDPOINT} \ - -H 'Content-Type: multipart/form-data' \ - -X POST -F "files=@./${image_fn}" +```textmate ``` -Now, test the microservice with posting a custom caption along with an image +If the output lines in the "choices.text" keys contain words (tokens) containing meaning, then the service is considered launched successfully. + +### 4. Validate MicroServices ```bash -curl --silent --write-out "HTTPSTATUS:%{http_code}" \ - ${DATAPREP_INGEST_SERVICE_ENDPOINT} \ - -H 'Content-Type: multipart/form-data' \ - -X POST -F "files=@./${image_fn}" -F "files=@./${caption_fn}" +# whisper service +curl http://${host_ip}:7066/v1/asr \ + -X POST \ + -d '{"audio": "UklGRigAAABXQVZFZm10IBIAAAABAAEARKwAAIhYAQACABAAAABkYXRhAgAAAAEA"}' \ + -H 'Content-Type: application/json' ``` -Also, you are able to get the list of all files that you uploaded: +Checking the response from the service. The response should be similar to text: -```bash -curl -X POST \ - -H "Content-Type: application/json" \ - ${DATAPREP_GET_FILE_ENDPOINT} +```textmate ``` -Then you will get the response python-style LIST like this. Notice the name of each uploaded file e.g., `videoname.mp4` will become `videoname_uuid.mp4` where `uuid` is a unique ID for each uploaded file. The same files that are uploaded twice will have different `uuid`. +### 4. Validate the Frontend (UI) -```bash -[ - "WeAreGoingOnBullrun_7ac553a1-116c-40a2-9fc5-deccbb89b507.mp4", - "WeAreGoingOnBullrun_6d13cf26-8ba2-4026-a3a9-ab2e5eb73a29.mp4", - "apple_fcade6e6-11a5-44a2-833a-3e534cbe4419.png", - "AudioSample_976a85a6-dc3e-43ab-966c-9d81beef780c.wav -] -``` +To access the UI, use the URL - http://${EXTERNAL_HOST_IP}:${MULTIMODALQNA_UI_SERVICE_PORT} +A page should open when you click through to this address: -To delete all uploaded files along with data indexed with `$INDEX_NAME` in REDIS. +![UI start page](../../../../assets/img/ui-starting-page.png) -```bash -curl -X POST \ - -H "Content-Type: application/json" \ - -d '{"file_path": "all"}' \ - ${DATAPREP_DELETE_FILE_ENDPOINT} -``` +If a page of this type has opened, then we believe that the service is running and responding, +and we can proceed to functional UI testing. + +### 5. Stop application -7. MegaService +##### If you use vLLM ```bash -curl http://${host_ip}:8888/v1/multimodalqna \ - -H "Content-Type: application/json" \ - -X POST \ - -d '{"messages": "What is the revenue of Nike in 2023?"}' +cd ~/multimodalqna-install/GenAIExamples/MultimodalQnA/docker_compose/amd/gpu/rocm +docker compose -f compose_vllm.yaml down ``` +##### If you use TGI + ```bash -curl http://${host_ip}:8888/v1/multimodalqna \ - -H "Content-Type: application/json" \ - -d '{"messages": [{"role": "user", "content": [{"type": "text", "text": "hello, "}, {"type": "image_url", "image_url": {"url": "https://www.ilankelman.org/stopsigns/australia.jpg"}}]}, {"role": "assistant", "content": "opea project! "}, {"role": "user", "content": "chao, "}], "max_tokens": 10}' +cd ~/multimodalqna-install/GenAIExamples/MultimodalQnA/docker_compose/amd/gpu/rocm +docker compose -f compose.yaml down ``` + diff --git a/MultimodalQnA/docker_compose/amd/gpu/rocm/compose_vllm.yaml b/MultimodalQnA/docker_compose/amd/gpu/rocm/compose_vllm.yaml new file mode 100644 index 0000000000..33c00a9490 --- /dev/null +++ b/MultimodalQnA/docker_compose/amd/gpu/rocm/compose_vllm.yaml @@ -0,0 +1,186 @@ +# Copyright (C) 2024 Advanced Micro Devices, Inc. +# SPDX-License-Identifier: Apache-2.0 + +services: + whisper-service: + image: ${REGISTRY:-opea}/whisper:${TAG:-latest} + container_name: whisper-service + ports: + - "7066:7066" + ipc: host + environment: + no_proxy: ${no_proxy} + http_proxy: ${http_proxy} + https_proxy: ${https_proxy} + restart: unless-stopped + redis-vector-db: + image: redis/redis-stack:7.2.0-v9 + container_name: redis-vector-db + ports: + - "6379:6379" + - "8001:8001" + dataprep-multimodal-redis: + image: ${REGISTRY:-opea}/dataprep:${TAG:-latest} + container_name: dataprep-multimodal-redis + depends_on: + - redis-vector-db + - lvm + ports: + - "6007:5000" + environment: + no_proxy: ${no_proxy} + http_proxy: ${http_proxy} + https_proxy: ${https_proxy} + REDIS_URL: ${REDIS_URL} + REDIS_HOST: ${REDIS_HOST} + INDEX_NAME: ${INDEX_NAME} + LVM_ENDPOINT: "http://${LVM_SERVICE_HOST_IP}:9399/v1/lvm" + HUGGINGFACEHUB_API_TOKEN: ${MULTIMODAL_HUGGINGFACEHUB_API_TOKEN} + MULTIMODAL_DATAPREP: true + DATAPREP_COMPONENT_NAME: "OPEA_DATAPREP_MULTIMODALREDIS" + restart: unless-stopped + embedding-multimodal-bridgetower: + image: ${REGISTRY:-opea}/embedding-multimodal-bridgetower:${TAG:-latest} + container_name: embedding-multimodal-bridgetower + ports: + - ${EMBEDDER_PORT}:${EMBEDDER_PORT} + environment: + no_proxy: ${no_proxy} + http_proxy: ${http_proxy} + https_proxy: ${https_proxy} + PORT: ${EMBEDDER_PORT} + healthcheck: + test: ["CMD-SHELL", "http_proxy='' curl -f http://localhost:${EMBEDDER_PORT}/v1/health_check"] + interval: 10s + timeout: 6s + retries: 18 + start_period: 30s + entrypoint: ["python", "bridgetower_server.py", "--device", "cpu", "--model_name_or_path", $EMBEDDING_MODEL_ID] + restart: unless-stopped + embedding: + image: ${REGISTRY:-opea}/embedding:${TAG:-latest} + container_name: embedding + depends_on: + embedding-multimodal-bridgetower: + condition: service_healthy + ports: + - ${MM_EMBEDDING_PORT_MICROSERVICE}:${MM_EMBEDDING_PORT_MICROSERVICE} + ipc: host + environment: + no_proxy: ${no_proxy} + http_proxy: ${http_proxy} + https_proxy: ${https_proxy} + MMEI_EMBEDDING_ENDPOINT: ${MMEI_EMBEDDING_ENDPOINT} + MM_EMBEDDING_PORT_MICROSERVICE: ${MM_EMBEDDING_PORT_MICROSERVICE} + MULTIMODAL_EMBEDDING: true + restart: unless-stopped + retriever-redis: + image: ${REGISTRY:-opea}/retriever:${TAG:-latest} + container_name: retriever-redis + depends_on: + - redis-vector-db + ports: + - "7000:7000" + ipc: host + environment: + no_proxy: ${no_proxy} + http_proxy: ${http_proxy} + https_proxy: ${https_proxy} + REDIS_URL: ${REDIS_URL} + INDEX_NAME: ${INDEX_NAME} + BRIDGE_TOWER_EMBEDDING: ${BRIDGE_TOWER_EMBEDDING} + LOGFLAG: ${LOGFLAG} + RETRIEVER_COMPONENT_NAME: "OPEA_RETRIEVER_REDIS" + restart: unless-stopped + multimodalqna-vllm-service: + image: ${REGISTRY:-opea}/vllm-rocm:${TAG:-latest} + container_name: multimodalqna-vllm-service + ports: + - "${MULTIMODAL_VLLM_SERVICE_PORT:-8081}:8011" + environment: + no_proxy: ${no_proxy} + http_proxy: ${http_proxy} + https_proxy: ${https_proxy} + HUGGINGFACEHUB_API_TOKEN: ${MULTIMODAL_HUGGINGFACEHUB_API_TOKEN} + HF_TOKEN: ${MULTIMODAL_HUGGINGFACEHUB_API_TOKEN} + HF_HUB_DISABLE_PROGRESS_BARS: 1 + HF_HUB_ENABLE_HF_TRANSFER: 0 + WILM_USE_TRITON_FLASH_ATTENTION: 0 + PYTORCH_JIT: 0 + volumes: + - "./data:/data" + shm_size: 20G + devices: + - /dev/kfd:/dev/kfd + - /dev/dri/:/dev/dri/ + cap_add: + - SYS_PTRACE + group_add: + - video + security_opt: + - seccomp:unconfined + - apparmor=unconfined + command: "--model ${MULTIMODAL_LLM_MODEL_ID} --swap-space 16 --disable-log-requests --dtype float16 --tensor-parallel-size 1 --host 0.0.0.0 --port 8011 --num-scheduler-steps 1 --distributed-executor-backend \"mp\"" + ipc: host + lvm: + image: ${REGISTRY:-opea}/lvm:${TAG:-latest} + container_name: lvm + depends_on: + - tgi-rocm + ports: + - "9399:9399" + ipc: host + environment: + no_proxy: ${no_proxy} + http_proxy: ${http_proxy} + https_proxy: ${https_proxy} + LVM_COMPONENT_NAME: "OPEA_TGI_LLAVA_LVM" + LVM_ENDPOINT: ${LVM_ENDPOINT} + HF_HUB_DISABLE_PROGRESS_BARS: 1 + HF_HUB_ENABLE_HF_TRANSFER: 0 + restart: unless-stopped + multimodalqna: + image: ${REGISTRY:-opea}/multimodalqna:${TAG:-latest} + container_name: multimodalqna-backend-server + depends_on: + - redis-vector-db + - dataprep-multimodal-redis + - embedding + - retriever-redis + - lvm + ports: + - "8888:8888" + environment: + no_proxy: ${no_proxy} + https_proxy: ${https_proxy} + http_proxy: ${http_proxy} + MEGA_SERVICE_HOST_IP: ${MEGA_SERVICE_HOST_IP} + MM_EMBEDDING_SERVICE_HOST_IP: ${MM_EMBEDDING_SERVICE_HOST_IP} + MM_EMBEDDING_PORT_MICROSERVICE: ${MM_EMBEDDING_PORT_MICROSERVICE} + MM_RETRIEVER_SERVICE_HOST_IP: ${MM_RETRIEVER_SERVICE_HOST_IP} + LVM_SERVICE_HOST_IP: ${LVM_SERVICE_HOST_IP} + WHISPER_SERVER_PORT: ${WHISPER_SERVER_PORT} + WHISPER_SERVER_ENDPOINT: ${WHISPER_SERVER_ENDPOINT} + ipc: host + restart: always + multimodalqna-ui: + image: ${REGISTRY:-opea}/multimodalqna-ui:${TAG:-latest} + container_name: multimodalqna-gradio-ui-server + depends_on: + - multimodalqna + ports: + - "5173:5173" + environment: + - no_proxy=${no_proxy} + - https_proxy=${https_proxy} + - http_proxy=${http_proxy} + - BACKEND_SERVICE_ENDPOINT=${BACKEND_SERVICE_ENDPOINT} + - DATAPREP_INGEST_SERVICE_ENDPOINT=${DATAPREP_INGEST_SERVICE_ENDPOINT} + - DATAPREP_GEN_TRANSCRIPT_SERVICE_ENDPOINT=${DATAPREP_GEN_TRANSCRIPT_SERVICE_ENDPOINT} + - DATAPREP_GEN_CAPTION_SERVICE_ENDPOINT=${DATAPREP_GEN_CAPTION_SERVICE_ENDPOINT} + ipc: host + restart: always + +networks: + default: + driver: bridge diff --git a/MultimodalQnA/docker_compose/amd/gpu/rocm/set_env_vllm.sh b/MultimodalQnA/docker_compose/amd/gpu/rocm/set_env_vllm.sh new file mode 100644 index 0000000000..5cb482bc55 --- /dev/null +++ b/MultimodalQnA/docker_compose/amd/gpu/rocm/set_env_vllm.sh @@ -0,0 +1,33 @@ +#!/usr/bin/env bash + +# Copyright (C) 2024 Advanced Micro Devices, Inc. +# SPDX-License-Identifier: Apache-2.0 + +export HOST_IP=${your_host_ip_address} +export MULTIMODAL_HUGGINGFACEHUB_API_TOKEN=${your_huggingfacehub_token} +export MULTIMODAL_TGI_SERVICE_PORT="8399" +export no_proxy=${your_no_proxy} +export http_proxy=${your_http_proxy} +export https_proxy=${your_http_proxy} +export BRIDGE_TOWER_EMBEDDING=true +export EMBEDDER_PORT=6006 +export MMEI_EMBEDDING_ENDPOINT="http://${HOST_IP}:$EMBEDDER_PORT" +export MM_EMBEDDING_PORT_MICROSERVICE=6000 +export REDIS_URL="redis://${HOST_IP}:6379" +export REDIS_HOST=${HOST_IP} +export INDEX_NAME="mm-rag-redis" +export LLAVA_SERVER_PORT=8399 +export LVM_ENDPOINT="http://${HOST_IP}:8399" +export EMBEDDING_MODEL_ID="BridgeTower/bridgetower-large-itm-mlm-itc" +export LVM_MODEL_ID="Xkev/Llama-3.2V-11B-cot" +export WHISPER_MODEL="base" +export MM_EMBEDDING_SERVICE_HOST_IP=${HOST_IP} +export MM_RETRIEVER_SERVICE_HOST_IP=${HOST_IP} +export LVM_SERVICE_HOST_IP=${HOST_IP} +export MEGA_SERVICE_HOST_IP=${HOST_IP} +export BACKEND_SERVICE_ENDPOINT="http://${HOST_IP}:8888/v1/multimodalqna" +export DATAPREP_INGEST_SERVICE_ENDPOINT="http://${HOST_IP}:6007/v1/dataprep/ingest" +export DATAPREP_GEN_TRANSCRIPT_SERVICE_ENDPOINT="http://${HOST_IP}:6007/v1/dataprep/generate_transcripts" +export DATAPREP_GEN_CAPTION_SERVICE_ENDPOINT="http://${HOST_IP}:6007/v1/dataprep/generate_captions" +export DATAPREP_GET_FILE_ENDPOINT="http://${HOST_IP}:6007/v1/dataprep/get" +export DATAPREP_DELETE_FILE_ENDPOINT="http://${HOST_IP}:6007/v1/dataprep/delete" diff --git a/MultimodalQnA/docker_image_build/build.yaml b/MultimodalQnA/docker_image_build/build.yaml index 1fc599c3e5..643f1e7c68 100644 --- a/MultimodalQnA/docker_image_build/build.yaml +++ b/MultimodalQnA/docker_image_build/build.yaml @@ -65,3 +65,8 @@ services: dockerfile: comps/asr/src/integrations/dependency/whisper/Dockerfile extends: multimodalqna image: ${REGISTRY:-opea}/whisper:${TAG:-latest} + vllm-rocm: + build: + context: GenAIComps + dockerfile: comps/third_parties/vllm/src/Dockerfile.amd_gpu + image: ${REGISTRY:-opea}/vllm-rocm:${TAG:-latest} diff --git a/MultimodalQnA/tests/test_compose_vllm_on_rocm.sh b/MultimodalQnA/tests/test_compose_vllm_on_rocm.sh new file mode 100644 index 0000000000..0ff44635a8 --- /dev/null +++ b/MultimodalQnA/tests/test_compose_vllm_on_rocm.sh @@ -0,0 +1,331 @@ +#!/bin/bash +# Copyright (C) 2024 Advanced Micro Devices, Inc. +# SPDX-License-Identifier: Apache-2.0 + +set -ex +IMAGE_REPO=${IMAGE_REPO:-"opea"} +IMAGE_TAG=${IMAGE_TAG:-"latest"} +echo "REGISTRY=IMAGE_REPO=${IMAGE_REPO}" +echo "TAG=IMAGE_TAG=${IMAGE_TAG}" +export REGISTRY=${IMAGE_REPO} +export TAG=${IMAGE_TAG} + +WORKPATH=$(dirname "$PWD") +LOG_PATH="$WORKPATH/tests" +ip_address=$(hostname -I | awk '{print $1}') + +export image_fn="apple.png" +export video_fn="WeAreGoingOnBullrun.mp4" +export caption_fn="apple.txt" + +function build_docker_images() { + opea_branch=${opea_branch:-"main"} + # If the opea_branch isn't main, replace the git clone branch in Dockerfile. + if [[ "${opea_branch}" != "main" ]]; then + cd $WORKPATH + OLD_STRING="RUN git clone --depth 1 https://github.com/opea-project/GenAIComps.git" + NEW_STRING="RUN git clone --depth 1 --branch ${opea_branch} https://github.com/opea-project/GenAIComps.git" + find . -type f -name "Dockerfile*" | while read -r file; do + echo "Processing file: $file" + sed -i "s|$OLD_STRING|$NEW_STRING|g" "$file" + done + fi + + cd $WORKPATH/docker_image_build + git clone https://github.com/opea-project/GenAIComps.git && cd GenAIComps && git checkout "${opea_branch:-"main"}" && cd ../ + echo "Build all the images with --no-cache, check docker_image_build.log for details..." + service_list="multimodalqna multimodalqna-ui embedding-multimodal-bridgetower embedding retriever lvm dataprep whisper vllm-rocm" + docker compose -f build.yaml build ${service_list} --no-cache > ${LOG_PATH}/docker_image_build.log + + docker images && sleep 1m +} + +function setup_env() { + export HOST_IP=${ip_address} + export host_ip=${ip_address} + export MULTIMODAL_HUGGINGFACEHUB_API_TOKEN=${HUGGINGFACEHUB_API_TOKEN} + export MULTIMODAL_VLLM_SERVICE_PORT="8399" + export no_proxy=${your_no_proxy} + export http_proxy=${your_http_proxy} + export https_proxy=${your_http_proxy} + export BRIDGE_TOWER_EMBEDDING=true + export EMBEDDER_PORT=6006 + export MMEI_EMBEDDING_ENDPOINT="http://${HOST_IP}:$EMBEDDER_PORT" + export MM_EMBEDDING_PORT_MICROSERVICE=6000 + export WHISPER_SERVER_PORT=7066 + export WHISPER_SERVER_ENDPOINT="http://${HOST_IP}:${WHISPER_SERVER_PORT}/v1/asr" + export REDIS_URL="redis://${HOST_IP}:6379" + export REDIS_HOST=${HOST_IP} + export INDEX_NAME="mm-rag-redis" + export LVM_ENDPOINT="http://${HOST_IP}:8399" + export EMBEDDING_MODEL_ID="BridgeTower/bridgetower-large-itm-mlm-itc" + export LVM_MODEL_ID="Xkev/Llama-3.2V-11B-cot" + export WHISPER_MODEL="base" + export MM_EMBEDDING_SERVICE_HOST_IP=${HOST_IP} + export MM_RETRIEVER_SERVICE_HOST_IP=${HOST_IP} + export LVM_SERVICE_HOST_IP=${HOST_IP} + export MEGA_SERVICE_HOST_IP=${HOST_IP} + export BACKEND_SERVICE_ENDPOINT="http://${HOST_IP}:8888/v1/multimodalqna" + export DATAPREP_INGEST_SERVICE_ENDPOINT="http://${HOST_IP}:6007/v1/dataprep/ingest" + export DATAPREP_GEN_TRANSCRIPT_SERVICE_ENDPOINT="http://${HOST_IP}:6007/v1/dataprep/generate_transcripts" + export DATAPREP_GEN_CAPTION_SERVICE_ENDPOINT="http://${HOST_IP}:6007/v1/dataprep/generate_captions" + export DATAPREP_GET_FILE_ENDPOINT="http://${HOST_IP}:6007/v1/dataprep/get" + export DATAPREP_DELETE_FILE_ENDPOINT="http://${HOST_IP}:6007/v1/dataprep/delete" +} + +function start_services() { + cd $WORKPATH/docker_compose/amd/gpu/rocm + docker compose -f compose_vllm.yaml up -d > ${LOG_PATH}/start_services_with_compose.log + sleep 1m +} + +function prepare_data() { + cd $LOG_PATH + echo "Downloading image and video" + wget https://github.com/docarray/docarray/blob/main/tests/toydata/image-data/apple.png?raw=true -O ${image_fn} + wget http://commondatastorage.googleapis.com/gtv-videos-bucket/sample/WeAreGoingOnBullrun.mp4 -O ${video_fn} + echo "Writing caption file" + echo "This is an apple." > ${caption_fn} + sleep 1m +} + + +function validate_service() { + local URL="$1" + local EXPECTED_RESULT="$2" + local SERVICE_NAME="$3" + local DOCKER_NAME="$4" + local INPUT_DATA="$5" + + if [[ $SERVICE_NAME == *"dataprep-multimodal-redis-transcript"* ]]; then + cd $LOG_PATH + HTTP_RESPONSE=$(curl --silent --write-out "HTTPSTATUS:%{http_code}" -X POST -F "files=@./${video_fn}" -H 'Content-Type: multipart/form-data' "$URL") + elif [[ $SERVICE_NAME == *"dataprep-multimodal-redis-caption"* ]]; then + cd $LOG_PATH + HTTP_RESPONSE=$(curl --silent --write-out "HTTPSTATUS:%{http_code}" -X POST -F "files=@./${image_fn}" -H 'Content-Type: multipart/form-data' "$URL") + elif [[ $SERVICE_NAME == *"dataprep-multimodal-redis-ingest"* ]]; then + cd $LOG_PATH + HTTP_RESPONSE=$(curl --silent --write-out "HTTPSTATUS:%{http_code}" -X POST -F "files=@./${image_fn}" -F "files=@./apple.txt" -H 'Content-Type: multipart/form-data' "$URL") + elif [[ $SERVICE_NAME == *"dataprep_get"* ]]; then + HTTP_RESPONSE=$(curl --silent --write-out "HTTPSTATUS:%{http_code}" -X POST -H 'Content-Type: application/json' "$URL") + elif [[ $SERVICE_NAME == *"dataprep_del"* ]]; then + HTTP_RESPONSE=$(curl --silent --write-out "HTTPSTATUS:%{http_code}" -X POST -d '{"file_path": "apple.txt"}' -H 'Content-Type: application/json' "$URL") + else + HTTP_RESPONSE=$(curl --silent --write-out "HTTPSTATUS:%{http_code}" -X POST -d "$INPUT_DATA" -H 'Content-Type: application/json' "$URL") + fi + HTTP_STATUS=$(echo $HTTP_RESPONSE | tr -d '\n' | sed -e 's/.*HTTPSTATUS://') + RESPONSE_BODY=$(echo $HTTP_RESPONSE | sed -e 's/HTTPSTATUS\:.*//g') + + docker logs ${DOCKER_NAME} >> ${LOG_PATH}/${SERVICE_NAME}.log + + # check response status + if [ "$HTTP_STATUS" -ne "200" ]; then + echo "[ $SERVICE_NAME ] HTTP status is not 200. Received status was $HTTP_STATUS" + exit 1 + else + echo "[ $SERVICE_NAME ] HTTP status is 200. Checking content..." + fi + # check response body + if [[ "$RESPONSE_BODY" != *"$EXPECTED_RESULT"* ]]; then + echo "[ $SERVICE_NAME ] Content does not match the expected result: $RESPONSE_BODY" + exit 1 + else + echo "[ $SERVICE_NAME ] Content is as expected." + fi + + sleep 1s +} + +function validate_microservices() { + # Check if the microservices are running correctly. + + # Bridgetower Embedding Server + echo "Validating embedding-multimodal-bridgetower" + validate_service \ + "http://${host_ip}:${EMBEDDER_PORT}/v1/encode" \ + '"embedding":[' \ + "embedding-multimodal-bridgetower" \ + "embedding-multimodal-bridgetower" \ + '{"text":"This is example"}' + + validate_service \ + "http://${host_ip}:${EMBEDDER_PORT}/v1/encode" \ + '"embedding":[' \ + "embedding-multimodal-bridgetower" \ + "embedding-multimodal-bridgetower" \ + '{"text":"This is example", "img_b64_str": "iVBORw0KGgoAAAANSUhEUgAAAAoAAAAKCAYAAACNMs+9AAAAFUlEQVR42mP8/5+hnoEIwDiqkL4KAcT9GO0U4BxoAAAAAElFTkSuQmCC"}' + + # embedding microservice + echo "Validating embedding" + validate_service \ + "http://${host_ip}:$MM_EMBEDDING_PORT_MICROSERVICE/v1/embeddings" \ + '"embedding":[' \ + "embedding" \ + "embedding" \ + '{"text" : "This is some sample text."}' + + validate_service \ + "http://${host_ip}:$MM_EMBEDDING_PORT_MICROSERVICE/v1/embeddings" \ + '"embedding":[' \ + "embedding" \ + "embedding" \ + '{"text": {"text" : "This is some sample text."}, "image" : {"url": "https://github.com/docarray/docarray/blob/main/tests/toydata/image-data/apple.png?raw=true"}}' + + sleep 1m # retrieval can't curl as expected, try to wait for more time + + # test data prep + echo "Data Prep with Generating Transcript for Video" + validate_service \ + "${DATAPREP_GEN_TRANSCRIPT_SERVICE_ENDPOINT}" \ + "Data preparation succeeded" \ + "dataprep-multimodal-redis-transcript" \ + "dataprep-multimodal-redis" + + echo "Data Prep with Image & Caption Ingestion" + validate_service \ + "${DATAPREP_INGEST_SERVICE_ENDPOINT}" \ + "Data preparation succeeded" \ + "dataprep-multimodal-redis-ingest" \ + "dataprep-multimodal-redis" + + echo "Validating get file returns mp4" + validate_service \ + "${DATAPREP_GET_FILE_ENDPOINT}" \ + '.mp4' \ + "dataprep_get" \ + "dataprep-multimodal-redis" + + echo "Validating get file returns png" + validate_service \ + "${DATAPREP_GET_FILE_ENDPOINT}" \ + '.png' \ + "dataprep_get" \ + "dataprep-multimodal-redis" + + sleep 2m + + # multimodal retrieval microservice + echo "Validating retriever-redis" + your_embedding=$(python3 -c "import random; embedding = [random.uniform(-1, 1) for _ in range(512)]; print(embedding)") + validate_service \ + "http://${host_ip}:7000/v1/retrieval" \ + "retrieved_docs" \ + "retriever-redis" \ + "retriever-redis" \ + "{\"text\":\"test\",\"embedding\":${your_embedding}}" + + sleep 5m + + #vLLM Service + validate_service \ + "${host_ip}:${MULTIMODAL_VLLM_SERVICE_PORT}/v1/chat/completions" \ + "content" \ + "multimodalqna-vllm-service" \ + "multimodalqna-vllm-service" \ + '{"model": "Intel/neural-chat-7b-v3-3", "messages": [{"role": "user", "content": [{"type": "text", "text": "What’s in this image?"}, {"type": "image_url", "image_url": {"url": https://github.com/docarray/docarray/blob/main/tests/toydata/image-data/apple.png}}]"}], "max_tokens": 17}' + + + # lvm + echo "Evaluating lvm" + validate_service \ + "http://${host_ip}:9399/v1/lvm" \ + '"text":"' \ + "lvm" \ + "lvm" \ + '{"retrieved_docs": [], "initial_query": "What is this?", "top_n": 1, "metadata": [{"b64_img_str": "iVBORw0KGgoAAAANSUhEUgAAAAoAAAAKCAYAAACNMs+9AAAAFUlEQVR42mP8/5+hnoEIwDiqkL4KAcT9GO0U4BxoAAAAAElFTkSuQmCC", "transcript_for_inference": "yellow image", "video_id": "8c7461df-b373-4a00-8696-9a2234359fe0", "time_of_frame_ms":"37000000", "source_video":"WeAreGoingOnBullrun_8c7461df-b373-4a00-8696-9a2234359fe0.mp4"}], "chat_template":"The caption of the image is: '\''{context}'\''. {question}"}' + + # data prep requiring lvm + echo "Data Prep with Generating Caption for Image" + validate_service \ + "${DATAPREP_GEN_CAPTION_SERVICE_ENDPOINT}" \ + "Data preparation succeeded" \ + "dataprep-multimodal-redis-caption" \ + "dataprep-multimodal-redis" + + sleep 3m +} + +function validate_megaservice() { + # Curl the Mega Service with retrieval + echo "Validate megaservice with first query" + validate_service \ + "http://${host_ip}:8888/v1/multimodalqna" \ + '"time_of_frame_ms":' \ + "multimodalqna" \ + "multimodalqna-backend-server" \ + '{"messages": "What is the revenue of Nike in 2023?"}' + + echo "Validate megaservice with first audio query" + validate_service \ + "http://${host_ip}:8888/v1/multimodalqna" \ + '"time_of_frame_ms":' \ + "multimodalqna" \ + "multimodalqna-backend-server" \ + '{"messages": [{"role": "user", "content": [{"type": "audio", "audio": "UklGRigAAABXQVZFZm10IBIAAAABAAEARKwAAIhYAQACABAAAABkYXRhAgAAAAEA"}]}]}' + + echo "Validate megaservice with follow-up query" + validate_service \ + "http://${host_ip}:8888/v1/multimodalqna" \ + '"content":"' \ + "multimodalqna" \ + "multimodalqna-backend-server" \ + '{"messages": [{"role": "user", "content": [{"type": "audio", "audio": "UklGRigAAABXQVZFZm10IBIAAAABAAEARKwAAIhYAQACABAAAABkYXRhAgAAAAEA"}, {"type": "image_url", "image_url": {"url": "https://www.ilankelman.org/stopsigns/australia.jpg"}}]}, {"role": "assistant", "content": "opea project! "}, {"role": "user", "content": [{"type": "text", "text": "goodbye"}]}]}' + + echo "Validate megaservice with multiple text queries" + validate_service \ + "http://${host_ip}:8888/v1/multimodalqna" \ + '"content":"' \ + "multimodalqna" \ + "multimodalqna-backend-server" \ + '{"messages": [{"role": "user", "content": [{"type": "text", "text": "hello, "}]}, {"role": "assistant", "content": "opea project! "}, {"role": "user", "content": [{"type": "text", "text": "goodbye"}]}]}' +} + +function validate_delete { + echo "Validate data prep delete files" + export DATAPREP_DELETE_FILE_ENDPOINT="http://${HOST_IP}:6007/v1/dataprep/delete" + validate_service \ + "${DATAPREP_DELETE_FILE_ENDPOINT}" \ + '{"status":true}' \ + "dataprep_del" \ + "dataprep-multimodal-redis" +} + +function delete_data() { + cd $LOG_PATH + echo "Deleting image, video, and caption" + rm -rf ${image_fn} + rm -rf ${video_fn} + rm -rf ${caption_fn} +} + +function stop_docker() { + cd $WORKPATH/docker_compose/amd/gpu/rocm + docker compose -f compose.yaml stop && docker compose -f compose.yaml rm -f +} + +function main() { + + setup_env + stop_docker + if [[ "$IMAGE_REPO" == "opea" ]]; then build_docker_images; fi + start_time=$(date +%s) + start_services + end_time=$(date +%s) + duration=$((end_time-start_time)) + echo "Mega service start duration is $duration s" && sleep 1s + prepare_data + + validate_microservices + echo "==== microservices validated ====" + validate_megaservice + echo "==== megaservice validated ====" + validate_delete + echo "==== delete validated ====" + + delete_data + stop_docker + echo y | docker system prune + +} + +main From 4fa7a40b469e2339d51f99e843e334bc86111e5e Mon Sep 17 00:00:00 2001 From: "pre-commit-ci[bot]" <66853113+pre-commit-ci[bot]@users.noreply.github.com> Date: Tue, 1 Apr 2025 11:03:22 +0000 Subject: [PATCH 02/25] [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --- .../docker_compose/amd/gpu/rocm/README.md | 33 ++++++++++--------- 1 file changed, 18 insertions(+), 15 deletions(-) diff --git a/MultimodalQnA/docker_compose/amd/gpu/rocm/README.md b/MultimodalQnA/docker_compose/amd/gpu/rocm/README.md index fc4e95b1d2..4aaf71b8c5 100644 --- a/MultimodalQnA/docker_compose/amd/gpu/rocm/README.md +++ b/MultimodalQnA/docker_compose/amd/gpu/rocm/README.md @@ -116,8 +116,10 @@ After launching your instance, you can connect to it using SSH (for Linux instan - opea/embedding:latest - opea/embedding-multimodal-bridgetower:latest - opea/retriever:latest - - opea/whisper:latest + - opea/whisper:latest + --- + ## Deploy the MultimodalQnA Application ### Docker Compose Configuration for AMD GPUs @@ -214,7 +216,6 @@ Set the values of the variables: - **Variables with names like "**\*\*\*\*\*\*\_PORT"\*\* - These variables set the IP port numbers for establishing network connections to the application services. The values shown in the file set_env.sh or set_env_vllm they are the values used for the development and testing of the application, as well as configured for the environment in which the development is performed. These values must be configured in accordance with the rules of network access to your environment's server, and must not overlap with the IP ports of other applications that are already in use. - #### Required Models By default, the multimodal-embedding and LVM models are set to a default value as listed below: @@ -262,30 +263,31 @@ All containers should be running and should not restart: ##### If you use vLLM: - multimodalqna-vllm-service -- multimodalqna-lvm +- multimodalqna-lvm - multimodalqna-backend-server - multimodalqna-gradio-ui-server - whisper-service -- embedding-multimodal-bridgetower +- embedding-multimodal-bridgetower - redis-vector-db - embedding -- retriever-redis +- retriever-redis - dataprep-multimodal-redis ##### If you use TGI: - tgi-llava-rocm-server -- multimodalqna-lvm +- multimodalqna-lvm - multimodalqna-backend-server - multimodalqna-gradio-ui-server - whisper-service -- embedding-multimodal-bridgetower +- embedding-multimodal-bridgetower - redis-vector-db - embedding -- retriever-redis +- retriever-redis - dataprep-multimodal-redis --- + ## Validate the Services ### 1. Validate the vLLM/TGI Service @@ -304,9 +306,9 @@ curl http://${HOST_IP}:${MULTIMODALQNA_VLLM_SERVICE_PORT}/v1/chat/completions \ Checking the response from the service. The response should be similar to JSON: -````json +```json -```` +``` If the service response has a meaningful response in the value of the "choices.message.content" key, then we consider the vLLM service to be successfully launched @@ -325,9 +327,9 @@ curl http://${HOST_IP}:${MULTIMODALQNA_TGI_SERVICE_PORT}/generate \ Checking the response from the service. The response should be similar to JSON: -````json +```json -```` +``` If the service response has a meaningful response in the value of the "generated_text" key, then we consider the TGI service to be successfully launched @@ -347,9 +349,9 @@ curl http://${HOST_IP}:${MULTIMODALQNA_LLM_SERVICE_PORT}/v1/chat/completions \ Checking the response from the service. The response should be similar to JSON: -````json +```json -```` +``` If the service response has a meaningful response in the value of the "choices.text" key, then we consider the vLLM service to be successfully launched @@ -369,6 +371,7 @@ curl http://${HOST_IP}:${MULTIMODALQNA_BACKEND_SERVICE_PORT}/v1/multimodalqna \ Checking the response from the service. The response should be similar to text: ```textmate + ``` If the output lines in the "choices.text" keys contain words (tokens) containing meaning, then the service is considered launched successfully. @@ -386,6 +389,7 @@ curl http://${host_ip}:7066/v1/asr \ Checking the response from the service. The response should be similar to text: ```textmate + ``` ### 4. Validate the Frontend (UI) @@ -413,4 +417,3 @@ docker compose -f compose_vllm.yaml down cd ~/multimodalqna-install/GenAIExamples/MultimodalQnA/docker_compose/amd/gpu/rocm docker compose -f compose.yaml down ``` - From 9732976241ecff2f1b246a379922d7f4c539ae98 Mon Sep 17 00:00:00 2001 From: Artem Astafev Date: Wed, 2 Apr 2025 12:26:30 +0700 Subject: [PATCH 03/25] update Readme.md and test for vllm Signed-off-by: Artem Astafev --- .../docker_compose/amd/gpu/rocm/README.md | 123 ++++++++++++++++-- .../docker_compose/amd/gpu/rocm/set_env.sh | 2 + .../amd/gpu/rocm/set_env_vllm.sh | 2 + .../tests/test_compose_vllm_on_rocm.sh | 18 +-- 4 files changed, 123 insertions(+), 22 deletions(-) diff --git a/MultimodalQnA/docker_compose/amd/gpu/rocm/README.md b/MultimodalQnA/docker_compose/amd/gpu/rocm/README.md index 4aaf71b8c5..533b6bd0c2 100644 --- a/MultimodalQnA/docker_compose/amd/gpu/rocm/README.md +++ b/MultimodalQnA/docker_compose/amd/gpu/rocm/README.md @@ -356,30 +356,95 @@ Checking the response from the service. The response should be similar to JSON: If the service response has a meaningful response in the value of the "choices.text" key, then we consider the vLLM service to be successfully launched -### 3. Validate the MegaService +### 3. Validate MicroServices +##### embedding-multimodal-bridgetower + +Text example: ```bash -DATA='{"messages": "Implement a high-level API for a TODO list application. '\ -'The API takes as input an operation request and updates the TODO list in place. '\ -'If the request is invalid, raise an exception."}' +curl http://${host_ip}:${EMM_BRIDGETOWER_PORT}/v1/encode \ + -X POST \ + -H "Content-Type:application/json" \ + -d '{"text":"This is example"}' +``` -curl http://${HOST_IP}:${MULTIMODALQNA_BACKEND_SERVICE_PORT}/v1/multimodalqna \ - -H "Content-Type: application/json" \ - -d "$DATA" +Checking the response from the service. The response should be similar to text: + +```textmate +{"embedding":[0.036936961114406586,-0.0022056063171476126,0.0891181230545044,-0.019263656809926033,-0.049174826592206955,-0.05129311606287956,-0.07172256708145142,0.04365323856472969,0.03275766223669052,0.0059910244308412075,-0.0301326...,-0.0031989417038857937,0.042092420160770416]} +``` + +Image example: +```bash +curl http://${host_ip}:${EMM_BRIDGETOWER_PORT}/v1/encode \ + -X POST \ + -H "Content-Type:application/json" \ + -d '{"text":"This is example", "img_b64_str": "iVBORw0KGgoAAAANSUhEUgAAAAoAAAAKCAYAAACNMs+9AAAAFUlEQVR42mP8/5+hnoEIwDiqkL4KAcT9GO0U4BxoAAAAAElFTkSuQmCC"}' ``` Checking the response from the service. The response should be similar to text: ```textmate +{"embedding":[0.024372786283493042,-0.003916610032320023,0.07578050345182419,...,-0.046543147414922714]} +``` +##### embedding + +Text example: +```bash +curl http://${host_ip}:$MM_EMBEDDING_PORT_MICROSERVICE/v1/embeddings \ + -X POST \ + -H "Content-Type: application/json" \ + -d '{"text" : "This is some sample text."}' ``` -If the output lines in the "choices.text" keys contain words (tokens) containing meaning, then the service is considered launched successfully. +Checking the response from the service. The response should be similar to text: -### 4. Validate MicroServices +```textmate +{"id":"4fb722012a2719e38188190e1cb37ed3","text":"This is some sample text.","embedding":[0.043303076177835464,-0.051807764917612076,...,-0.0005179636646062136,-0.0027774290647357702],"search_type":"similarity","k":4,"distance_threshold":null,"fetch_k":20,"lambda_mult":0.5,"score_threshold":0.2,"constraints":null,"url":null,"base64_image":null} +``` + +Image example: +```bash +curl http://${host_ip}:${EMM_BRIDGETOWER_PORT}/v1/encode \ + -X POST \ + -H "Content-Type:application/json" \ + -d '{"text":"This is example", "img_b64_str": "iVBORw0KGgoAAAANSUhEUgAAAAoAAAAKCAYAAACNMs+9AAAAFUlEQVR42mP8/5+hnoEIwDiqkL4KAcT9GO0U4BxoAAAAAElFTkSuQmCC"}' +``` + +Checking the response from the service. The response should be similar to text: + +```textmate +{"id":"cce4eab623255c4c632fb920e277dcf7","text":"This is some sample text.","embedding":[0.02613169699907303,-0.049398183822631836,...,0.03544217720627785],"search_type":"similarity","k":4,"distance_threshold":null,"fetch_k":20,"lambda_mult":0.5,"score_threshold":0.2,"constraints":null,"url":"https://github.com/docarray/docarray/blob/main/tests/toydata/image-data/apple.png?raw=true","base64_image":"iVBORw0KGgoAAAANSUhEUgAAAoEAAAJqCAMAAABjDmrLAAAABGdBTUEAALGPC/.../BCU5wghOc4AQnOMEJTnCCE5zgBCc4wQlOcILzqvO/ARWd2ns+lvHkAAAAAElFTkSuQmCC"} +``` + + +##### retriever-multimodal-redis + +set "your_embedding" variable: +```bash +export your_embedding=$(python3 -c "import random; embedding = [random.uniform(-1, 1) for _ in range(512)]; print(embedding)") +``` + +Test Redis retriever +```bash +curl http://${host_ip}:${REDIS_RETRIEVER_PORT}/v1/retrieval \ + -X POST \ + -H "Content-Type: application/json" \ + -d "{\"text\":\"test\",\"embedding\":${your_embedding}}" +``` + +Checking the response from the service. The response should be similar to text: + +```textmate +{"id":"80a4f3fc5f5d5cd31ab1e3912f6b6042","retrieved_docs":[],"initial_query":"test","top_n":1,"metadata":[]} +``` + + + +##### whisper service ```bash -# whisper service curl http://${host_ip}:7066/v1/asr \ -X POST \ -d '{"audio": "UklGRigAAABXQVZFZm10IBIAAAABAAEARKwAAIhYAQACABAAAABkYXRhAgAAAAEA"}' \ @@ -388,21 +453,53 @@ curl http://${host_ip}:7066/v1/asr \ Checking the response from the service. The response should be similar to text: +```textmate +{"asr_result":"you"} +``` + +##### lvm + +```bash + +``` + +Checking the response from the service. The response should be similar to text: + ```textmate ``` -### 4. Validate the Frontend (UI) +### 4. Validate the MegaService + +```bash +DATA='{"messages": [{"role": "user", "content": [{"type": "audio", "audio": "UklGRigAAABXQVZFZm10IBIAAAABAAEARKwAAIhYAQACABAAAABkYXRhAgAAAAEA"}]}]}' + +curl http://${HOST_IP}:${MULTIMODALQNA_BACKEND_SERVICE_PORT}/v1/multimodalqna \ + -H "Content-Type: application/json" \ + -d "$DATA" +``` + +Checking the response from the service. The response should be similar to text: + +```textmate +{"id":"chatcmpl-75aK2KWCfxZmVcfh5tiiHj","object":"chat.completion","created":1743568232,"model":"multimodalqna","choices":[{"index":0,"message":{"role":"assistant","content":"There is no video segments retrieved given the query!"},"finish_reason":"stop","metadata":{"audio":"you"}}],"usage":{"prompt_tokens":0,"total_tokens":0,"completion_tokens":0}} +``` + +If the output lines in the "choices.text" keys contain words (tokens) containing meaning, then the service is considered launched successfully. + + + +### 5. Validate the Frontend (UI) To access the UI, use the URL - http://${EXTERNAL_HOST_IP}:${MULTIMODALQNA_UI_SERVICE_PORT} A page should open when you click through to this address: -![UI start page](../../../../assets/img/ui-starting-page.png) +![UI start page](../../../../assets/img/mmqna-ui.png) If a page of this type has opened, then we believe that the service is running and responding, and we can proceed to functional UI testing. -### 5. Stop application +### 6. Stop application ##### If you use vLLM diff --git a/MultimodalQnA/docker_compose/amd/gpu/rocm/set_env.sh b/MultimodalQnA/docker_compose/amd/gpu/rocm/set_env.sh index 5cb482bc55..5c7516e7a4 100644 --- a/MultimodalQnA/docker_compose/amd/gpu/rocm/set_env.sh +++ b/MultimodalQnA/docker_compose/amd/gpu/rocm/set_env.sh @@ -31,3 +31,5 @@ export DATAPREP_GEN_TRANSCRIPT_SERVICE_ENDPOINT="http://${HOST_IP}:6007/v1/datap export DATAPREP_GEN_CAPTION_SERVICE_ENDPOINT="http://${HOST_IP}:6007/v1/dataprep/generate_captions" export DATAPREP_GET_FILE_ENDPOINT="http://${HOST_IP}:6007/v1/dataprep/get" export DATAPREP_DELETE_FILE_ENDPOINT="http://${HOST_IP}:6007/v1/dataprep/delete" +export WHISPER_PORT="7066" +export WHISPER_SERVER_ENDPOINT="http://${host_ip}:${WHISPER_PORT}/v1/asr" diff --git a/MultimodalQnA/docker_compose/amd/gpu/rocm/set_env_vllm.sh b/MultimodalQnA/docker_compose/amd/gpu/rocm/set_env_vllm.sh index 5cb482bc55..5c7516e7a4 100644 --- a/MultimodalQnA/docker_compose/amd/gpu/rocm/set_env_vllm.sh +++ b/MultimodalQnA/docker_compose/amd/gpu/rocm/set_env_vllm.sh @@ -31,3 +31,5 @@ export DATAPREP_GEN_TRANSCRIPT_SERVICE_ENDPOINT="http://${HOST_IP}:6007/v1/datap export DATAPREP_GEN_CAPTION_SERVICE_ENDPOINT="http://${HOST_IP}:6007/v1/dataprep/generate_captions" export DATAPREP_GET_FILE_ENDPOINT="http://${HOST_IP}:6007/v1/dataprep/get" export DATAPREP_DELETE_FILE_ENDPOINT="http://${HOST_IP}:6007/v1/dataprep/delete" +export WHISPER_PORT="7066" +export WHISPER_SERVER_ENDPOINT="http://${host_ip}:${WHISPER_PORT}/v1/asr" diff --git a/MultimodalQnA/tests/test_compose_vllm_on_rocm.sh b/MultimodalQnA/tests/test_compose_vllm_on_rocm.sh index 0ff44635a8..d56245c8d4 100644 --- a/MultimodalQnA/tests/test_compose_vllm_on_rocm.sh +++ b/MultimodalQnA/tests/test_compose_vllm_on_rocm.sh @@ -222,17 +222,17 @@ function validate_microservices() { "content" \ "multimodalqna-vllm-service" \ "multimodalqna-vllm-service" \ - '{"model": "Intel/neural-chat-7b-v3-3", "messages": [{"role": "user", "content": [{"type": "text", "text": "What’s in this image?"}, {"type": "image_url", "image_url": {"url": https://github.com/docarray/docarray/blob/main/tests/toydata/image-data/apple.png}}]"}], "max_tokens": 17}' + '{"model": "Xkev/Llama-3.2V-11B-cot", "messages": [{"role": "user", "content": [{"type": "text", "text": "What’s in this image?"}, {"type": "image_url", "image_url": {"url": https://github.com/docarray/docarray/blob/main/tests/toydata/image-data/apple.png}}]"}], "max_tokens": 17}' - # lvm - echo "Evaluating lvm" - validate_service \ - "http://${host_ip}:9399/v1/lvm" \ - '"text":"' \ - "lvm" \ - "lvm" \ - '{"retrieved_docs": [], "initial_query": "What is this?", "top_n": 1, "metadata": [{"b64_img_str": "iVBORw0KGgoAAAANSUhEUgAAAAoAAAAKCAYAAACNMs+9AAAAFUlEQVR42mP8/5+hnoEIwDiqkL4KAcT9GO0U4BxoAAAAAElFTkSuQmCC", "transcript_for_inference": "yellow image", "video_id": "8c7461df-b373-4a00-8696-9a2234359fe0", "time_of_frame_ms":"37000000", "source_video":"WeAreGoingOnBullrun_8c7461df-b373-4a00-8696-9a2234359fe0.mp4"}], "chat_template":"The caption of the image is: '\''{context}'\''. {question}"}' +# # lvm +# echo "Evaluating lvm" +# validate_service \ +# "http://${host_ip}:9399/v1/lvm" \ +# '"text":"' \ +# "lvm" \ +# "lvm" \ +# '{"retrieved_docs": [], "initial_query": "What is this?", "top_n": 1, "metadata": [{"b64_img_str": "iVBORw0KGgoAAAANSUhEUgAAAAoAAAAKCAYAAACNMs+9AAAAFUlEQVR42mP8/5+hnoEIwDiqkL4KAcT9GO0U4BxoAAAAAElFTkSuQmCC", "transcript_for_inference": "yellow image", "video_id": "8c7461df-b373-4a00-8696-9a2234359fe0", "time_of_frame_ms":"37000000", "source_video":"WeAreGoingOnBullrun_8c7461df-b373-4a00-8696-9a2234359fe0.mp4"}], "chat_template":"The caption of the image is: '\''{context}'\''. {question}"}' # data prep requiring lvm echo "Data Prep with Generating Caption for Image" From f42da6a70bc44afe854ca8b01db08745c3ce6f9a Mon Sep 17 00:00:00 2001 From: "pre-commit-ci[bot]" <66853113+pre-commit-ci[bot]@users.noreply.github.com> Date: Wed, 2 Apr 2025 05:27:35 +0000 Subject: [PATCH 04/25] [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --- MultimodalQnA/docker_compose/amd/gpu/rocm/README.md | 11 ++++++----- 1 file changed, 6 insertions(+), 5 deletions(-) diff --git a/MultimodalQnA/docker_compose/amd/gpu/rocm/README.md b/MultimodalQnA/docker_compose/amd/gpu/rocm/README.md index 533b6bd0c2..b8394f43ea 100644 --- a/MultimodalQnA/docker_compose/amd/gpu/rocm/README.md +++ b/MultimodalQnA/docker_compose/amd/gpu/rocm/README.md @@ -361,6 +361,7 @@ then we consider the vLLM service to be successfully launched ##### embedding-multimodal-bridgetower Text example: + ```bash curl http://${host_ip}:${EMM_BRIDGETOWER_PORT}/v1/encode \ -X POST \ @@ -375,6 +376,7 @@ Checking the response from the service. The response should be similar to text: ``` Image example: + ```bash curl http://${host_ip}:${EMM_BRIDGETOWER_PORT}/v1/encode \ -X POST \ @@ -391,6 +393,7 @@ Checking the response from the service. The response should be similar to text: ##### embedding Text example: + ```bash curl http://${host_ip}:$MM_EMBEDDING_PORT_MICROSERVICE/v1/embeddings \ -X POST \ @@ -405,6 +408,7 @@ Checking the response from the service. The response should be similar to text: ``` Image example: + ```bash curl http://${host_ip}:${EMM_BRIDGETOWER_PORT}/v1/encode \ -X POST \ @@ -418,15 +422,16 @@ Checking the response from the service. The response should be similar to text: {"id":"cce4eab623255c4c632fb920e277dcf7","text":"This is some sample text.","embedding":[0.02613169699907303,-0.049398183822631836,...,0.03544217720627785],"search_type":"similarity","k":4,"distance_threshold":null,"fetch_k":20,"lambda_mult":0.5,"score_threshold":0.2,"constraints":null,"url":"https://github.com/docarray/docarray/blob/main/tests/toydata/image-data/apple.png?raw=true","base64_image":"iVBORw0KGgoAAAANSUhEUgAAAoEAAAJqCAMAAABjDmrLAAAABGdBTUEAALGPC/.../BCU5wghOc4AQnOMEJTnCCE5zgBCc4wQlOcILzqvO/ARWd2ns+lvHkAAAAAElFTkSuQmCC"} ``` - ##### retriever-multimodal-redis set "your_embedding" variable: + ```bash export your_embedding=$(python3 -c "import random; embedding = [random.uniform(-1, 1) for _ in range(512)]; print(embedding)") ``` Test Redis retriever + ```bash curl http://${host_ip}:${REDIS_RETRIEVER_PORT}/v1/retrieval \ -X POST \ @@ -440,8 +445,6 @@ Checking the response from the service. The response should be similar to text: {"id":"80a4f3fc5f5d5cd31ab1e3912f6b6042","retrieved_docs":[],"initial_query":"test","top_n":1,"metadata":[]} ``` - - ##### whisper service ```bash @@ -487,8 +490,6 @@ Checking the response from the service. The response should be similar to text: If the output lines in the "choices.text" keys contain words (tokens) containing meaning, then the service is considered launched successfully. - - ### 5. Validate the Frontend (UI) To access the UI, use the URL - http://${EXTERNAL_HOST_IP}:${MULTIMODALQNA_UI_SERVICE_PORT} From 690432d370a5898ba92a323c7dc394087e68325f Mon Sep 17 00:00:00 2001 From: Artem Astafev Date: Wed, 2 Apr 2025 12:32:26 +0700 Subject: [PATCH 05/25] Update README.md Signed-off-by: Artem Astafev --- MultimodalQnA/docker_compose/amd/gpu/rocm/README.md | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/MultimodalQnA/docker_compose/amd/gpu/rocm/README.md b/MultimodalQnA/docker_compose/amd/gpu/rocm/README.md index 533b6bd0c2..56bd8a5f51 100644 --- a/MultimodalQnA/docker_compose/amd/gpu/rocm/README.md +++ b/MultimodalQnA/docker_compose/amd/gpu/rocm/README.md @@ -358,7 +358,7 @@ then we consider the vLLM service to be successfully launched ### 3. Validate MicroServices -##### embedding-multimodal-bridgetower +###### embedding-multimodal-bridgetower Text example: ```bash @@ -388,7 +388,7 @@ Checking the response from the service. The response should be similar to text: {"embedding":[0.024372786283493042,-0.003916610032320023,0.07578050345182419,...,-0.046543147414922714]} ``` -##### embedding +###### embedding Text example: ```bash @@ -419,7 +419,7 @@ Checking the response from the service. The response should be similar to text: ``` -##### retriever-multimodal-redis +###### retriever-multimodal-redis set "your_embedding" variable: ```bash @@ -442,7 +442,7 @@ Checking the response from the service. The response should be similar to text: -##### whisper service +###### whisper service ```bash curl http://${host_ip}:7066/v1/asr \ @@ -457,7 +457,7 @@ Checking the response from the service. The response should be similar to text: {"asr_result":"you"} ``` -##### lvm +###### lvm ```bash From f2034ca042711da4d22551d46b25202a0fe01687 Mon Sep 17 00:00:00 2001 From: "pre-commit-ci[bot]" <66853113+pre-commit-ci[bot]@users.noreply.github.com> Date: Wed, 2 Apr 2025 05:33:44 +0000 Subject: [PATCH 06/25] [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --- MultimodalQnA/docker_compose/amd/gpu/rocm/README.md | 11 ++++++----- 1 file changed, 6 insertions(+), 5 deletions(-) diff --git a/MultimodalQnA/docker_compose/amd/gpu/rocm/README.md b/MultimodalQnA/docker_compose/amd/gpu/rocm/README.md index 56bd8a5f51..e00834ee61 100644 --- a/MultimodalQnA/docker_compose/amd/gpu/rocm/README.md +++ b/MultimodalQnA/docker_compose/amd/gpu/rocm/README.md @@ -361,6 +361,7 @@ then we consider the vLLM service to be successfully launched ###### embedding-multimodal-bridgetower Text example: + ```bash curl http://${host_ip}:${EMM_BRIDGETOWER_PORT}/v1/encode \ -X POST \ @@ -375,6 +376,7 @@ Checking the response from the service. The response should be similar to text: ``` Image example: + ```bash curl http://${host_ip}:${EMM_BRIDGETOWER_PORT}/v1/encode \ -X POST \ @@ -391,6 +393,7 @@ Checking the response from the service. The response should be similar to text: ###### embedding Text example: + ```bash curl http://${host_ip}:$MM_EMBEDDING_PORT_MICROSERVICE/v1/embeddings \ -X POST \ @@ -405,6 +408,7 @@ Checking the response from the service. The response should be similar to text: ``` Image example: + ```bash curl http://${host_ip}:${EMM_BRIDGETOWER_PORT}/v1/encode \ -X POST \ @@ -418,15 +422,16 @@ Checking the response from the service. The response should be similar to text: {"id":"cce4eab623255c4c632fb920e277dcf7","text":"This is some sample text.","embedding":[0.02613169699907303,-0.049398183822631836,...,0.03544217720627785],"search_type":"similarity","k":4,"distance_threshold":null,"fetch_k":20,"lambda_mult":0.5,"score_threshold":0.2,"constraints":null,"url":"https://github.com/docarray/docarray/blob/main/tests/toydata/image-data/apple.png?raw=true","base64_image":"iVBORw0KGgoAAAANSUhEUgAAAoEAAAJqCAMAAABjDmrLAAAABGdBTUEAALGPC/.../BCU5wghOc4AQnOMEJTnCCE5zgBCc4wQlOcILzqvO/ARWd2ns+lvHkAAAAAElFTkSuQmCC"} ``` - ###### retriever-multimodal-redis set "your_embedding" variable: + ```bash export your_embedding=$(python3 -c "import random; embedding = [random.uniform(-1, 1) for _ in range(512)]; print(embedding)") ``` Test Redis retriever + ```bash curl http://${host_ip}:${REDIS_RETRIEVER_PORT}/v1/retrieval \ -X POST \ @@ -440,8 +445,6 @@ Checking the response from the service. The response should be similar to text: {"id":"80a4f3fc5f5d5cd31ab1e3912f6b6042","retrieved_docs":[],"initial_query":"test","top_n":1,"metadata":[]} ``` - - ###### whisper service ```bash @@ -487,8 +490,6 @@ Checking the response from the service. The response should be similar to text: If the output lines in the "choices.text" keys contain words (tokens) containing meaning, then the service is considered launched successfully. - - ### 5. Validate the Frontend (UI) To access the UI, use the URL - http://${EXTERNAL_HOST_IP}:${MULTIMODALQNA_UI_SERVICE_PORT} From e16aaf5d171f2579f56ee6d766a652c83f1bf010 Mon Sep 17 00:00:00 2001 From: Artem Astafev Date: Wed, 2 Apr 2025 12:38:53 +0700 Subject: [PATCH 07/25] Update README.md Signed-off-by: Artem Astafev --- MultimodalQnA/docker_compose/amd/gpu/rocm/README.md | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/MultimodalQnA/docker_compose/amd/gpu/rocm/README.md b/MultimodalQnA/docker_compose/amd/gpu/rocm/README.md index e00834ee61..4d313dcc99 100644 --- a/MultimodalQnA/docker_compose/amd/gpu/rocm/README.md +++ b/MultimodalQnA/docker_compose/amd/gpu/rocm/README.md @@ -358,7 +358,7 @@ then we consider the vLLM service to be successfully launched ### 3. Validate MicroServices -###### embedding-multimodal-bridgetower +#### embedding-multimodal-bridgetower Text example: @@ -390,7 +390,7 @@ Checking the response from the service. The response should be similar to text: {"embedding":[0.024372786283493042,-0.003916610032320023,0.07578050345182419,...,-0.046543147414922714]} ``` -###### embedding +#### embedding Text example: @@ -422,7 +422,7 @@ Checking the response from the service. The response should be similar to text: {"id":"cce4eab623255c4c632fb920e277dcf7","text":"This is some sample text.","embedding":[0.02613169699907303,-0.049398183822631836,...,0.03544217720627785],"search_type":"similarity","k":4,"distance_threshold":null,"fetch_k":20,"lambda_mult":0.5,"score_threshold":0.2,"constraints":null,"url":"https://github.com/docarray/docarray/blob/main/tests/toydata/image-data/apple.png?raw=true","base64_image":"iVBORw0KGgoAAAANSUhEUgAAAoEAAAJqCAMAAABjDmrLAAAABGdBTUEAALGPC/.../BCU5wghOc4AQnOMEJTnCCE5zgBCc4wQlOcILzqvO/ARWd2ns+lvHkAAAAAElFTkSuQmCC"} ``` -###### retriever-multimodal-redis +#### retriever-multimodal-redis set "your_embedding" variable: @@ -445,7 +445,7 @@ Checking the response from the service. The response should be similar to text: {"id":"80a4f3fc5f5d5cd31ab1e3912f6b6042","retrieved_docs":[],"initial_query":"test","top_n":1,"metadata":[]} ``` -###### whisper service +#### whisper service ```bash curl http://${host_ip}:7066/v1/asr \ @@ -460,7 +460,7 @@ Checking the response from the service. The response should be similar to text: {"asr_result":"you"} ``` -###### lvm +#### lvm ```bash From b552fe162ef63dd379a6f81276e9185f33f9bd94 Mon Sep 17 00:00:00 2001 From: Artem Astafev Date: Wed, 2 Apr 2025 12:47:48 +0700 Subject: [PATCH 08/25] Update Reamde.md and tests for vllm Signed-off-by: Artem Astafev --- .../docker_compose/amd/gpu/rocm/README.md | 4 ++-- .../tests/test_compose_vllm_on_rocm.sh | 18 ------------------ 2 files changed, 2 insertions(+), 20 deletions(-) diff --git a/MultimodalQnA/docker_compose/amd/gpu/rocm/README.md b/MultimodalQnA/docker_compose/amd/gpu/rocm/README.md index 4d313dcc99..f65944465a 100644 --- a/MultimodalQnA/docker_compose/amd/gpu/rocm/README.md +++ b/MultimodalQnA/docker_compose/amd/gpu/rocm/README.md @@ -502,14 +502,14 @@ and we can proceed to functional UI testing. ### 6. Stop application -##### If you use vLLM +#### If you use vLLM ```bash cd ~/multimodalqna-install/GenAIExamples/MultimodalQnA/docker_compose/amd/gpu/rocm docker compose -f compose_vllm.yaml down ``` -##### If you use TGI +#### If you use TGI ```bash cd ~/multimodalqna-install/GenAIExamples/MultimodalQnA/docker_compose/amd/gpu/rocm diff --git a/MultimodalQnA/tests/test_compose_vllm_on_rocm.sh b/MultimodalQnA/tests/test_compose_vllm_on_rocm.sh index d56245c8d4..02528603cd 100644 --- a/MultimodalQnA/tests/test_compose_vllm_on_rocm.sh +++ b/MultimodalQnA/tests/test_compose_vllm_on_rocm.sh @@ -224,24 +224,6 @@ function validate_microservices() { "multimodalqna-vllm-service" \ '{"model": "Xkev/Llama-3.2V-11B-cot", "messages": [{"role": "user", "content": [{"type": "text", "text": "What’s in this image?"}, {"type": "image_url", "image_url": {"url": https://github.com/docarray/docarray/blob/main/tests/toydata/image-data/apple.png}}]"}], "max_tokens": 17}' - -# # lvm -# echo "Evaluating lvm" -# validate_service \ -# "http://${host_ip}:9399/v1/lvm" \ -# '"text":"' \ -# "lvm" \ -# "lvm" \ -# '{"retrieved_docs": [], "initial_query": "What is this?", "top_n": 1, "metadata": [{"b64_img_str": "iVBORw0KGgoAAAANSUhEUgAAAAoAAAAKCAYAAACNMs+9AAAAFUlEQVR42mP8/5+hnoEIwDiqkL4KAcT9GO0U4BxoAAAAAElFTkSuQmCC", "transcript_for_inference": "yellow image", "video_id": "8c7461df-b373-4a00-8696-9a2234359fe0", "time_of_frame_ms":"37000000", "source_video":"WeAreGoingOnBullrun_8c7461df-b373-4a00-8696-9a2234359fe0.mp4"}], "chat_template":"The caption of the image is: '\''{context}'\''. {question}"}' - - # data prep requiring lvm - echo "Data Prep with Generating Caption for Image" - validate_service \ - "${DATAPREP_GEN_CAPTION_SERVICE_ENDPOINT}" \ - "Data preparation succeeded" \ - "dataprep-multimodal-redis-caption" \ - "dataprep-multimodal-redis" - sleep 3m } From fa37f35ad93307458dd97490a615db03222b8357 Mon Sep 17 00:00:00 2001 From: Artem Astafev Date: Wed, 2 Apr 2025 13:09:30 +0700 Subject: [PATCH 09/25] Update test_compose_vllm_on_rocm.sh Signed-off-by: Artem Astafev --- MultimodalQnA/tests/test_compose_vllm_on_rocm.sh | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/MultimodalQnA/tests/test_compose_vllm_on_rocm.sh b/MultimodalQnA/tests/test_compose_vllm_on_rocm.sh index 02528603cd..8d353fce6b 100644 --- a/MultimodalQnA/tests/test_compose_vllm_on_rocm.sh +++ b/MultimodalQnA/tests/test_compose_vllm_on_rocm.sh @@ -59,7 +59,7 @@ function setup_env() { export INDEX_NAME="mm-rag-redis" export LVM_ENDPOINT="http://${HOST_IP}:8399" export EMBEDDING_MODEL_ID="BridgeTower/bridgetower-large-itm-mlm-itc" - export LVM_MODEL_ID="Xkev/Llama-3.2V-11B-cot" + export MULTIMODAL_LVM_MODEL_ID="Xkev/Llama-3.2V-11B-cot" export WHISPER_MODEL="base" export MM_EMBEDDING_SERVICE_HOST_IP=${HOST_IP} export MM_RETRIEVER_SERVICE_HOST_IP=${HOST_IP} From a95753889564e6dcaaf6d22e0ddf8250c8a04ea7 Mon Sep 17 00:00:00 2001 From: Artem Astafev Date: Wed, 2 Apr 2025 15:50:39 +0700 Subject: [PATCH 10/25] update readme.md and tests Signed-off-by: Artem Astafev --- .../docker_compose/amd/gpu/rocm/README.md | 43 ++++--------------- .../docker_compose/amd/gpu/rocm/compose.yaml | 2 +- .../amd/gpu/rocm/compose_vllm.yaml | 7 +-- .../amd/gpu/rocm/set_env_vllm.sh | 4 +- 4 files changed, 16 insertions(+), 40 deletions(-) diff --git a/MultimodalQnA/docker_compose/amd/gpu/rocm/README.md b/MultimodalQnA/docker_compose/amd/gpu/rocm/README.md index f65944465a..f5cf300b6d 100644 --- a/MultimodalQnA/docker_compose/amd/gpu/rocm/README.md +++ b/MultimodalQnA/docker_compose/amd/gpu/rocm/README.md @@ -334,23 +334,19 @@ Checking the response from the service. The response should be similar to JSON: If the service response has a meaningful response in the value of the "generated_text" key, then we consider the TGI service to be successfully launched -### 2. Validate the LLM Service +### 2. Validate the LVM Service -```bash -DATA='{"query":"",'\ -'"max_tokens":256,"top_k":10,"top_p":0.95,"typical_p":0.95,"temperature":0.01,'\ -'"repetition_penalty":1.03,"stream":false}' - -curl http://${HOST_IP}:${MULTIMODALQNA_LLM_SERVICE_PORT}/v1/chat/completions \ - -X POST \ - -d "$DATA" \ - -H 'Content-Type: application/json' +```bash +curl http://${host_ip}:${MULTIMODALQNA_LVM_PORT}/v1/lvm \ + -X POST \ + -H 'Content-Type: application/json' \ + -d '{"retrieved_docs": [], "initial_query": "What is this?", "top_n": 1, "metadata": [], "chat_template":"The caption of the image is: '\''{context}'\''. {question}"}' ``` Checking the response from the service. The response should be similar to JSON: -```json - +```textmate +{"downstream_black_list":[],"id":"1b17e903e8c773be909bde0e7cfdb53f","text":" I will analyze the image and provide a detailed description based on its visual characteristics. I will then compare these characteristics to the standard answer provided to ensure accuracy.\n\n1. **Examine the Image**: The image is a solid color, which appears to be a shade of yellow. There are no additional elements or patterns present in the image.\n\n2. **Compare with Standard Answer**: The standard answer describes the image as a \"yellow image\" without any additional details or context. This matches the observed characteristics of the image being a single, uniform yellow color.\n\n3. **Conclusion**: Based on the visual analysis and comparison with the standard answer, the image can be accurately described as a \"yellow image.\" There are no other features or elements present that would alter this description.\n\nFINAL ANSWER: The image is a yellow image.","metadata":{"video_id":"8c7461df-b373-4a00-8696-9a2234359fe0","source_video":"WeAreGoingOnBullrun_8c7461df-b373-4a00-8696-9a2234359fe0.mp4","time_of_frame_ms":"37000000","transcript_for_inference":"yellow image"}} ``` If the service response has a meaningful response in the value of the "choices.text" key, @@ -460,17 +456,6 @@ Checking the response from the service. The response should be similar to text: {"asr_result":"you"} ``` -#### lvm - -```bash - -``` - -Checking the response from the service. The response should be similar to text: - -```textmate - -``` ### 4. Validate the MegaService @@ -490,17 +475,7 @@ Checking the response from the service. The response should be similar to text: If the output lines in the "choices.text" keys contain words (tokens) containing meaning, then the service is considered launched successfully. -### 5. Validate the Frontend (UI) - -To access the UI, use the URL - http://${EXTERNAL_HOST_IP}:${MULTIMODALQNA_UI_SERVICE_PORT} -A page should open when you click through to this address: - -![UI start page](../../../../assets/img/mmqna-ui.png) - -If a page of this type has opened, then we believe that the service is running and responding, -and we can proceed to functional UI testing. - -### 6. Stop application +### 5. Stop application #### If you use vLLM diff --git a/MultimodalQnA/docker_compose/amd/gpu/rocm/compose.yaml b/MultimodalQnA/docker_compose/amd/gpu/rocm/compose.yaml index 3e6fdfab05..02c94e07fc 100644 --- a/MultimodalQnA/docker_compose/amd/gpu/rocm/compose.yaml +++ b/MultimodalQnA/docker_compose/amd/gpu/rocm/compose.yaml @@ -156,7 +156,7 @@ services: MM_EMBEDDING_PORT_MICROSERVICE: ${MM_EMBEDDING_PORT_MICROSERVICE} MM_RETRIEVER_SERVICE_HOST_IP: ${MM_RETRIEVER_SERVICE_HOST_IP} LVM_SERVICE_HOST_IP: ${LVM_SERVICE_HOST_IP} - WHISPER_SERVER_PORT: ${WHISPER_SERVER_PORT} + WHISPER_SERVER_PORT: ${WHISPER_PORT} WHISPER_SERVER_ENDPOINT: ${WHISPER_SERVER_ENDPOINT} ipc: host restart: always diff --git a/MultimodalQnA/docker_compose/amd/gpu/rocm/compose_vllm.yaml b/MultimodalQnA/docker_compose/amd/gpu/rocm/compose_vllm.yaml index 33c00a9490..d49341bbe5 100644 --- a/MultimodalQnA/docker_compose/amd/gpu/rocm/compose_vllm.yaml +++ b/MultimodalQnA/docker_compose/amd/gpu/rocm/compose_vllm.yaml @@ -126,7 +126,7 @@ services: image: ${REGISTRY:-opea}/lvm:${TAG:-latest} container_name: lvm depends_on: - - tgi-rocm + - multimodalqna-vllm-service ports: - "9399:9399" ipc: host @@ -134,8 +134,9 @@ services: no_proxy: ${no_proxy} http_proxy: ${http_proxy} https_proxy: ${https_proxy} - LVM_COMPONENT_NAME: "OPEA_TGI_LLAVA_LVM" + LVM_COMPONENT_NAME: "OPEA_VLLM_LVM" LVM_ENDPOINT: ${LVM_ENDPOINT} + LLM_MODEL_ID: ${MULTIMODAL_LLM_MODEL_ID} HF_HUB_DISABLE_PROGRESS_BARS: 1 HF_HUB_ENABLE_HF_TRANSFER: 0 restart: unless-stopped @@ -159,7 +160,7 @@ services: MM_EMBEDDING_PORT_MICROSERVICE: ${MM_EMBEDDING_PORT_MICROSERVICE} MM_RETRIEVER_SERVICE_HOST_IP: ${MM_RETRIEVER_SERVICE_HOST_IP} LVM_SERVICE_HOST_IP: ${LVM_SERVICE_HOST_IP} - WHISPER_SERVER_PORT: ${WHISPER_SERVER_PORT} + WHISPER_SERVER_PORT: ${WHISPER_PORT} WHISPER_SERVER_ENDPOINT: ${WHISPER_SERVER_ENDPOINT} ipc: host restart: always diff --git a/MultimodalQnA/docker_compose/amd/gpu/rocm/set_env_vllm.sh b/MultimodalQnA/docker_compose/amd/gpu/rocm/set_env_vllm.sh index 5c7516e7a4..623d0c5272 100644 --- a/MultimodalQnA/docker_compose/amd/gpu/rocm/set_env_vllm.sh +++ b/MultimodalQnA/docker_compose/amd/gpu/rocm/set_env_vllm.sh @@ -16,8 +16,8 @@ export MM_EMBEDDING_PORT_MICROSERVICE=6000 export REDIS_URL="redis://${HOST_IP}:6379" export REDIS_HOST=${HOST_IP} export INDEX_NAME="mm-rag-redis" -export LLAVA_SERVER_PORT=8399 -export LVM_ENDPOINT="http://${HOST_IP}:8399" +export VLLM_SERVER_PORT=8081 +export LVM_ENDPOINT="http://${HOST_IP}:${VLLM_SERVER_PORT}" export EMBEDDING_MODEL_ID="BridgeTower/bridgetower-large-itm-mlm-itc" export LVM_MODEL_ID="Xkev/Llama-3.2V-11B-cot" export WHISPER_MODEL="base" From 57bbbae9abed19f6268472993bb4276f100578ce Mon Sep 17 00:00:00 2001 From: "pre-commit-ci[bot]" <66853113+pre-commit-ci[bot]@users.noreply.github.com> Date: Wed, 2 Apr 2025 08:51:16 +0000 Subject: [PATCH 11/25] [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --- MultimodalQnA/docker_compose/amd/gpu/rocm/README.md | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/MultimodalQnA/docker_compose/amd/gpu/rocm/README.md b/MultimodalQnA/docker_compose/amd/gpu/rocm/README.md index f5cf300b6d..fa0fc86c87 100644 --- a/MultimodalQnA/docker_compose/amd/gpu/rocm/README.md +++ b/MultimodalQnA/docker_compose/amd/gpu/rocm/README.md @@ -336,7 +336,7 @@ then we consider the TGI service to be successfully launched ### 2. Validate the LVM Service -```bash +```bash curl http://${host_ip}:${MULTIMODALQNA_LVM_PORT}/v1/lvm \ -X POST \ -H 'Content-Type: application/json' \ @@ -456,7 +456,6 @@ Checking the response from the service. The response should be similar to text: {"asr_result":"you"} ``` - ### 4. Validate the MegaService ```bash From 186f4cd774e41531f70ccd5de484676212036eee Mon Sep 17 00:00:00 2001 From: Artem Astafev Date: Wed, 2 Apr 2025 16:29:45 +0700 Subject: [PATCH 12/25] Update test_compose_vllm_on_rocm.sh Signed-off-by: Artem Astafev --- .../tests/test_compose_vllm_on_rocm.sh | 18 ++++++++++++++++++ 1 file changed, 18 insertions(+) diff --git a/MultimodalQnA/tests/test_compose_vllm_on_rocm.sh b/MultimodalQnA/tests/test_compose_vllm_on_rocm.sh index 8d353fce6b..2a3291ee25 100644 --- a/MultimodalQnA/tests/test_compose_vllm_on_rocm.sh +++ b/MultimodalQnA/tests/test_compose_vllm_on_rocm.sh @@ -217,6 +217,7 @@ function validate_microservices() { sleep 5m #vLLM Service + echo "Evaluating vllm" validate_service \ "${host_ip}:${MULTIMODAL_VLLM_SERVICE_PORT}/v1/chat/completions" \ "content" \ @@ -224,6 +225,23 @@ function validate_microservices() { "multimodalqna-vllm-service" \ '{"model": "Xkev/Llama-3.2V-11B-cot", "messages": [{"role": "user", "content": [{"type": "text", "text": "What’s in this image?"}, {"type": "image_url", "image_url": {"url": https://github.com/docarray/docarray/blob/main/tests/toydata/image-data/apple.png}}]"}], "max_tokens": 17}' + # lvm + echo "Evaluating lvm" + validate_service \ + "http://${host_ip}:9399/v1/lvm" \ + '"text":"' \ + "lvm" \ + "lvm" \ + '{"retrieved_docs": [], "initial_query": "What is this?", "top_n": 1, "metadata": [{"b64_img_str": "iVBORw0KGgoAAAANSUhEUgAAAAoAAAAKCAYAAACNMs+9AAAAFUlEQVR42mP8/5+hnoEIwDiqkL4KAcT9GO0U4BxoAAAAAElFTkSuQmCC", "transcript_for_inference": "yellow image", "video_id": "8c7461df-b373-4a00-8696-9a2234359fe0", "time_of_frame_ms":"37000000", "source_video":"WeAreGoingOnBullrun_8c7461df-b373-4a00-8696-9a2234359fe0.mp4"}], "chat_template":"The caption of the image is: '\''{context}'\''. {question}"}' + + # data prep requiring lvm + echo "Data Prep with Generating Caption for Image" + validate_service \ + "${DATAPREP_GEN_CAPTION_SERVICE_ENDPOINT}" \ + "Data preparation succeeded" \ + "dataprep-multimodal-redis-caption" \ + "dataprep-multimodal-redis" + sleep 3m } From e5957a9f54ecc92ac30b45b15bae181d695b13fa Mon Sep 17 00:00:00 2001 From: Artem Astafev Date: Wed, 2 Apr 2025 17:22:04 +0700 Subject: [PATCH 13/25] add ${MODEL_CACHE var Signed-off-by: Artem Astafev --- MultimodalQnA/docker_compose/amd/gpu/rocm/compose.yaml | 2 +- MultimodalQnA/docker_compose/amd/gpu/rocm/compose_vllm.yaml | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/MultimodalQnA/docker_compose/amd/gpu/rocm/compose.yaml b/MultimodalQnA/docker_compose/amd/gpu/rocm/compose.yaml index 02c94e07fc..bd5a96e298 100644 --- a/MultimodalQnA/docker_compose/amd/gpu/rocm/compose.yaml +++ b/MultimodalQnA/docker_compose/amd/gpu/rocm/compose.yaml @@ -105,7 +105,7 @@ services: HUGGINGFACEHUB_API_TOKEN: ${MULTIMODAL_HUGGINGFACEHUB_API_TOKEN} HUGGING_FACE_HUB_TOKEN: ${MULTIMODAL_HUGGINGFACEHUB_API_TOKEN} volumes: - - "/var/opea/multimodalqna-service/data:/data" + - "${MODEL_CACHE:-./data}:/data" shm_size: 64g devices: - /dev/kfd:/dev/kfd diff --git a/MultimodalQnA/docker_compose/amd/gpu/rocm/compose_vllm.yaml b/MultimodalQnA/docker_compose/amd/gpu/rocm/compose_vllm.yaml index d49341bbe5..82afa697d7 100644 --- a/MultimodalQnA/docker_compose/amd/gpu/rocm/compose_vllm.yaml +++ b/MultimodalQnA/docker_compose/amd/gpu/rocm/compose_vllm.yaml @@ -108,7 +108,7 @@ services: WILM_USE_TRITON_FLASH_ATTENTION: 0 PYTORCH_JIT: 0 volumes: - - "./data:/data" + - "${MODEL_CACHE:-./data}:/data" shm_size: 20G devices: - /dev/kfd:/dev/kfd From b5780af4ad38aee7769b5c8024c99c2629d36797 Mon Sep 17 00:00:00 2001 From: Artem Astafev Date: Wed, 2 Apr 2025 17:52:33 +0700 Subject: [PATCH 14/25] Update test_compose_vllm_on_rocm.sh Signed-off-by: Artem Astafev --- MultimodalQnA/tests/test_compose_vllm_on_rocm.sh | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/MultimodalQnA/tests/test_compose_vllm_on_rocm.sh b/MultimodalQnA/tests/test_compose_vllm_on_rocm.sh index 2a3291ee25..fa9b49a6ca 100644 --- a/MultimodalQnA/tests/test_compose_vllm_on_rocm.sh +++ b/MultimodalQnA/tests/test_compose_vllm_on_rocm.sh @@ -59,7 +59,7 @@ function setup_env() { export INDEX_NAME="mm-rag-redis" export LVM_ENDPOINT="http://${HOST_IP}:8399" export EMBEDDING_MODEL_ID="BridgeTower/bridgetower-large-itm-mlm-itc" - export MULTIMODAL_LVM_MODEL_ID="Xkev/Llama-3.2V-11B-cot" + export MULTIMODAL_LLM_MODEL_ID="Xkev/Llama-3.2V-11B-cot" export WHISPER_MODEL="base" export MM_EMBEDDING_SERVICE_HOST_IP=${HOST_IP} export MM_RETRIEVER_SERVICE_HOST_IP=${HOST_IP} From e00e685ee964be829cd9a8d0aa9635a7919a6068 Mon Sep 17 00:00:00 2001 From: Artem Astafev Date: Wed, 2 Apr 2025 18:22:18 +0700 Subject: [PATCH 15/25] Update test_compose_vllm_on_rocm.sh Signed-off-by: Artem Astafev --- MultimodalQnA/tests/test_compose_vllm_on_rocm.sh | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/MultimodalQnA/tests/test_compose_vllm_on_rocm.sh b/MultimodalQnA/tests/test_compose_vllm_on_rocm.sh index fa9b49a6ca..251a24081e 100644 --- a/MultimodalQnA/tests/test_compose_vllm_on_rocm.sh +++ b/MultimodalQnA/tests/test_compose_vllm_on_rocm.sh @@ -223,7 +223,7 @@ function validate_microservices() { "content" \ "multimodalqna-vllm-service" \ "multimodalqna-vllm-service" \ - '{"model": "Xkev/Llama-3.2V-11B-cot", "messages": [{"role": "user", "content": [{"type": "text", "text": "What’s in this image?"}, {"type": "image_url", "image_url": {"url": https://github.com/docarray/docarray/blob/main/tests/toydata/image-data/apple.png}}]"}], "max_tokens": 17}' + '{"model": "Xkev/Llama-3.2V-11B-cot", "messages": [{"role": "user", "content": "What is Deep Learning?"}], "max_tokens": 17}' # lvm echo "Evaluating lvm" From 290e370bbb48c3ffcb398484c217258e19fdf347 Mon Sep 17 00:00:00 2001 From: Artem Astafev Date: Wed, 2 Apr 2025 19:08:28 +0700 Subject: [PATCH 16/25] Update README.md Signed-off-by: Artem Astafev --- .../docker_compose/amd/gpu/rocm/README.md | 30 ++++++++++++++++--- 1 file changed, 26 insertions(+), 4 deletions(-) diff --git a/MultimodalQnA/docker_compose/amd/gpu/rocm/README.md b/MultimodalQnA/docker_compose/amd/gpu/rocm/README.md index fa0fc86c87..14e66d989a 100644 --- a/MultimodalQnA/docker_compose/amd/gpu/rocm/README.md +++ b/MultimodalQnA/docker_compose/amd/gpu/rocm/README.md @@ -296,7 +296,7 @@ All containers should be running and should not restart: ```bash DATA='{"model": "Xkev/Llama-3.2V-11B-cot", '\ -'"messages": [{"role": "user", "content": ""}], "max_tokens": 256}' +'"messages": [{"role": "user", "content": "What is Deep Learning?"}], "max_tokens": 256}' curl http://${HOST_IP}:${MULTIMODALQNA_VLLM_SERVICE_PORT}/v1/chat/completions \ -X POST \ @@ -307,7 +307,27 @@ curl http://${HOST_IP}:${MULTIMODALQNA_VLLM_SERVICE_PORT}/v1/chat/completions \ Checking the response from the service. The response should be similar to JSON: ```json - +{ + "id": "chatcmpl-a3761920c4034131b3cab073b8e8b841", + "object": "chat.completion", + "created": 1742959065, + "model": "Intel/neural-chat-7b-v3-3", + "choices": [ + { + "index": 0, + "message": { + "role": "assistant", + "content": " Deep Learning refers to a modern approach of Artificial Intelligence that aims to replicate the way human brains process information by teaching computers to learn from data without extensive programming", + "tool_calls": [] + }, + "logprobs": null, + "finish_reason": "length", + "stop_reason": null + } + ], + "usage": { "prompt_tokens": 15, "total_tokens": 47, "completion_tokens": 32, "prompt_tokens_details": null }, + "prompt_logprobs": null +} ``` If the service response has a meaningful response in the value of the "choices.message.content" key, @@ -316,7 +336,7 @@ then we consider the vLLM service to be successfully launched #### If you use TGI: ```bash -DATA='{"inputs":"",'\ +DATA='{"inputs":"What is Deep Learning?",'\ '"parameters":{"max_new_tokens":256,"do_sample": true}}' curl http://${HOST_IP}:${MULTIMODALQNA_TGI_SERVICE_PORT}/generate \ @@ -328,7 +348,9 @@ curl http://${HOST_IP}:${MULTIMODALQNA_TGI_SERVICE_PORT}/generate \ Checking the response from the service. The response should be similar to JSON: ```json - +{ + "generated_text": "\n\nDeep Learning is a subset of machine learning, which focuses on developing methods inspired by the functioning of the human brain; more specifically, the way it processes and acquires various types of knowledge and information. To enable deep learning, the networks are composed of multiple processing layers that form a hierarchy, with each layer learning more complex and abstraction levels of data representation.\n\nThe principle of Deep Learning is to emulate the structure of neurons in the human brain to construct artificial neural networks capable to accomplish complicated pattern recognition tasks more effectively and accurately. Therefore, these neural networks contain a series of hierarchical components, where units in earlier layers receive simple inputs and are activated by these inputs. The activation of the units in later layers are the results of multiple nonlinear transformations generated from reconstructing and integrating the information in previous layers. In other words, by combining various pieces of information at each layer, a Deep Learning network can extract the input features that best represent the structure of data, providing their outputs at the last layer or final level of abstraction.\n\nThe main idea of using these 'deep' networks in contrast to regular algorithms is that they are capable of representing hierarchical relationships that exist within the data and learn these representations by" +} ``` If the service response has a meaningful response in the value of the "generated_text" key, From 857cca1e2a0793711245f2cdcc1e8278eb4826a1 Mon Sep 17 00:00:00 2001 From: Artem Astafev Date: Wed, 2 Apr 2025 19:28:24 +0700 Subject: [PATCH 17/25] Update test_compose_on_rocm.sh Signed-off-by: Artem Astafev --- MultimodalQnA/tests/test_compose_on_rocm.sh | 1 + 1 file changed, 1 insertion(+) diff --git a/MultimodalQnA/tests/test_compose_on_rocm.sh b/MultimodalQnA/tests/test_compose_on_rocm.sh index 9ba5c68c90..a93d898f57 100644 --- a/MultimodalQnA/tests/test_compose_on_rocm.sh +++ b/MultimodalQnA/tests/test_compose_on_rocm.sh @@ -72,6 +72,7 @@ function setup_env() { export DATAPREP_GEN_CAPTION_SERVICE_ENDPOINT="http://${HOST_IP}:6007/v1/dataprep/generate_captions" export DATAPREP_GET_FILE_ENDPOINT="http://${HOST_IP}:6007/v1/dataprep/get" export DATAPREP_DELETE_FILE_ENDPOINT="http://${HOST_IP}:6007/v1/dataprep/delete" + export MODEL_CACHE="/var/opea/multimodalqna-service/data" } function start_services() { From b951830d191fe832e9e7842d7c1605187ad0d4bd Mon Sep 17 00:00:00 2001 From: Artem Astafev Date: Thu, 3 Apr 2025 11:03:38 +0700 Subject: [PATCH 18/25] Update test_compose_on_rocm.sh Signed-off-by: Artem Astafev --- MultimodalQnA/tests/test_compose_on_rocm.sh | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/MultimodalQnA/tests/test_compose_on_rocm.sh b/MultimodalQnA/tests/test_compose_on_rocm.sh index a93d898f57..cf8a34f4a8 100644 --- a/MultimodalQnA/tests/test_compose_on_rocm.sh +++ b/MultimodalQnA/tests/test_compose_on_rocm.sh @@ -78,7 +78,7 @@ function setup_env() { function start_services() { cd $WORKPATH/docker_compose/amd/gpu/rocm docker compose -f compose.yaml up -d > ${LOG_PATH}/start_services_with_compose.log - sleep 1m + sleep 5m } function prepare_data() { From 4bffe57d7fd64b555402dda9eedd278b96997a7e Mon Sep 17 00:00:00 2001 From: Artem Astafev Date: Thu, 3 Apr 2025 13:01:31 +0700 Subject: [PATCH 19/25] Add containers ready check Signed-off-by: Artem Astafev --- MultimodalQnA/tests/test_compose_on_rocm.sh | 10 +++++++++- MultimodalQnA/tests/test_compose_vllm_on_rocm.sh | 10 +++++++++- 2 files changed, 18 insertions(+), 2 deletions(-) diff --git a/MultimodalQnA/tests/test_compose_on_rocm.sh b/MultimodalQnA/tests/test_compose_on_rocm.sh index 80529c4760..1992caf16d 100644 --- a/MultimodalQnA/tests/test_compose_on_rocm.sh +++ b/MultimodalQnA/tests/test_compose_on_rocm.sh @@ -78,7 +78,15 @@ function setup_env() { function start_services() { cd $WORKPATH/docker_compose/amd/gpu/rocm docker compose -f compose.yaml up -d > ${LOG_PATH}/start_services_with_compose.log - sleep 5m + n=0 + until [[ "$n" -ge 100 ]]; do + docker logs tgi-rocm >& $LOG_PATH/search-vllm-service_start.log + if grep -q "Connected" $LOG_PATH/search-vllm-service_start.log; then + break + fi + sleep 10s + n=$((n+1)) + done } function prepare_data() { diff --git a/MultimodalQnA/tests/test_compose_vllm_on_rocm.sh b/MultimodalQnA/tests/test_compose_vllm_on_rocm.sh index 251a24081e..ef9023dcfd 100644 --- a/MultimodalQnA/tests/test_compose_vllm_on_rocm.sh +++ b/MultimodalQnA/tests/test_compose_vllm_on_rocm.sh @@ -76,7 +76,15 @@ function setup_env() { function start_services() { cd $WORKPATH/docker_compose/amd/gpu/rocm docker compose -f compose_vllm.yaml up -d > ${LOG_PATH}/start_services_with_compose.log - sleep 1m + n=0 + until [[ "$n" -ge 100 ]]; do + docker logs multimodalqna-vllm-service >& $LOG_PATH/search-vllm-service_start.log + if grep -q "Application startup complete" $LOG_PATH/search-vllm-service_start.log; then + break + fi + sleep 10s + n=$((n+1)) + done } function prepare_data() { From 451596a083e1dccd541d259064db573a51f3cfa1 Mon Sep 17 00:00:00 2001 From: Artem Astafev Date: Thu, 3 Apr 2025 13:21:30 +0700 Subject: [PATCH 20/25] Update test_compose_on_rocm.sh Signed-off-by: Artem Astafev --- MultimodalQnA/tests/test_compose_on_rocm.sh | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/MultimodalQnA/tests/test_compose_on_rocm.sh b/MultimodalQnA/tests/test_compose_on_rocm.sh index 1992caf16d..6dc2bbb345 100644 --- a/MultimodalQnA/tests/test_compose_on_rocm.sh +++ b/MultimodalQnA/tests/test_compose_on_rocm.sh @@ -80,7 +80,7 @@ function start_services() { docker compose -f compose.yaml up -d > ${LOG_PATH}/start_services_with_compose.log n=0 until [[ "$n" -ge 100 ]]; do - docker logs tgi-rocm >& $LOG_PATH/search-vllm-service_start.log + docker logs tgi-llava-rocm-server >& $LOG_PATH/search-vllm-service_start.log if grep -q "Connected" $LOG_PATH/search-vllm-service_start.log; then break fi From 168f9a181b40631a566542fc04eac84af74bee0c Mon Sep 17 00:00:00 2001 From: Artem Astafev Date: Thu, 3 Apr 2025 15:21:47 +0700 Subject: [PATCH 21/25] Update test_compose_on_rocm.sh Signed-off-by: Artem Astafev --- MultimodalQnA/tests/test_compose_on_rocm.sh | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/MultimodalQnA/tests/test_compose_on_rocm.sh b/MultimodalQnA/tests/test_compose_on_rocm.sh index 6dc2bbb345..416afa4feb 100644 --- a/MultimodalQnA/tests/test_compose_on_rocm.sh +++ b/MultimodalQnA/tests/test_compose_on_rocm.sh @@ -80,8 +80,8 @@ function start_services() { docker compose -f compose.yaml up -d > ${LOG_PATH}/start_services_with_compose.log n=0 until [[ "$n" -ge 100 ]]; do - docker logs tgi-llava-rocm-server >& $LOG_PATH/search-vllm-service_start.log - if grep -q "Connected" $LOG_PATH/search-vllm-service_start.log; then + docker logs tgi-llava-rocm-server >& $LOG_PATH/tgi-llava-rocm-server_start.log + if grep -q "Connected" $LOG_PATH/tgi-llava-rocm-server_start.log; then break fi sleep 10s From b8cc224d04128ac5668014f0de4457b68750542b Mon Sep 17 00:00:00 2001 From: Artem Astafev Date: Fri, 4 Apr 2025 14:10:17 +0700 Subject: [PATCH 22/25] update MODEL_CACHE var for tests Signed-off-by: Artem Astafev --- MultimodalQnA/tests/test_compose_on_rocm.sh | 2 +- MultimodalQnA/tests/test_compose_vllm_on_rocm.sh | 1 + 2 files changed, 2 insertions(+), 1 deletion(-) diff --git a/MultimodalQnA/tests/test_compose_on_rocm.sh b/MultimodalQnA/tests/test_compose_on_rocm.sh index 416afa4feb..f776a7b101 100644 --- a/MultimodalQnA/tests/test_compose_on_rocm.sh +++ b/MultimodalQnA/tests/test_compose_on_rocm.sh @@ -72,7 +72,7 @@ function setup_env() { export DATAPREP_GEN_CAPTION_SERVICE_ENDPOINT="http://${HOST_IP}:6007/v1/dataprep/generate_captions" export DATAPREP_GET_FILE_ENDPOINT="http://${HOST_IP}:6007/v1/dataprep/get" export DATAPREP_DELETE_FILE_ENDPOINT="http://${HOST_IP}:6007/v1/dataprep/delete" - export MODEL_CACHE="/var/opea/multimodalqna-service/data" + export MODEL_CACHE=${model_cache:-"/var/opea/multimodalqna-service/data"} } function start_services() { diff --git a/MultimodalQnA/tests/test_compose_vllm_on_rocm.sh b/MultimodalQnA/tests/test_compose_vllm_on_rocm.sh index ef9023dcfd..65fb87d6c9 100644 --- a/MultimodalQnA/tests/test_compose_vllm_on_rocm.sh +++ b/MultimodalQnA/tests/test_compose_vllm_on_rocm.sh @@ -71,6 +71,7 @@ function setup_env() { export DATAPREP_GEN_CAPTION_SERVICE_ENDPOINT="http://${HOST_IP}:6007/v1/dataprep/generate_captions" export DATAPREP_GET_FILE_ENDPOINT="http://${HOST_IP}:6007/v1/dataprep/get" export DATAPREP_DELETE_FILE_ENDPOINT="http://${HOST_IP}:6007/v1/dataprep/delete" + export MODEL_CACHE=${model_cache:-"/var/opea/multimodalqna-service/data"} } function start_services() { From 00878ff2978fd7733f5a50ffa6fc749a37d9e8d9 Mon Sep 17 00:00:00 2001 From: "pre-commit-ci[bot]" <66853113+pre-commit-ci[bot]@users.noreply.github.com> Date: Wed, 9 Apr 2025 04:18:30 +0000 Subject: [PATCH 23/25] [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --- MultimodalQnA/docker_compose/amd/gpu/rocm/README.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/MultimodalQnA/docker_compose/amd/gpu/rocm/README.md b/MultimodalQnA/docker_compose/amd/gpu/rocm/README.md index 779869a70b..2470f805c1 100644 --- a/MultimodalQnA/docker_compose/amd/gpu/rocm/README.md +++ b/MultimodalQnA/docker_compose/amd/gpu/rocm/README.md @@ -144,7 +144,7 @@ security_opt: This configuration forwards all available GPUs to the container. To use a specific GPU, specify its `cardN` and `renderN` device IDs. For example: -```yaml +````yaml shm_size: 1g devices: - /dev/kfd:/dev/kfd @@ -169,7 +169,7 @@ Use AMD GPU driver utilities to determine the correct `cardN` and `renderN` IDs ```bash ### Replace the string 'your_huggingfacehub_token' with your HuggingFacehub repository access token. export HUGGINGFACEHUB_API_TOKEN='your_huggingfacehub_token' -``` +```` #### Set variables value in set_env\*\*\*\*.sh file: From f6a8cad78c4b5bd3a6cec15191c3619e0929d7a5 Mon Sep 17 00:00:00 2001 From: Artem Astafev Date: Wed, 9 Apr 2025 12:24:05 +0700 Subject: [PATCH 24/25] Update README.md Signed-off-by: Artem Astafev --- MultimodalQnA/docker_compose/amd/gpu/rocm/README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/MultimodalQnA/docker_compose/amd/gpu/rocm/README.md b/MultimodalQnA/docker_compose/amd/gpu/rocm/README.md index 2470f805c1..3bf2adcb8b 100644 --- a/MultimodalQnA/docker_compose/amd/gpu/rocm/README.md +++ b/MultimodalQnA/docker_compose/amd/gpu/rocm/README.md @@ -156,6 +156,7 @@ group_add: - video security_opt: - seccomp:unconfined +```` **How to Identify GPU Device IDs:** Use AMD GPU driver utilities to determine the correct `cardN` and `renderN` IDs for your GPU. From 1ff771cbe8486810de20c22cbc6ed30a9997eb8d Mon Sep 17 00:00:00 2001 From: "pre-commit-ci[bot]" <66853113+pre-commit-ci[bot]@users.noreply.github.com> Date: Wed, 9 Apr 2025 05:24:35 +0000 Subject: [PATCH 25/25] [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --- MultimodalQnA/docker_compose/amd/gpu/rocm/README.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/MultimodalQnA/docker_compose/amd/gpu/rocm/README.md b/MultimodalQnA/docker_compose/amd/gpu/rocm/README.md index 3bf2adcb8b..14e66d989a 100644 --- a/MultimodalQnA/docker_compose/amd/gpu/rocm/README.md +++ b/MultimodalQnA/docker_compose/amd/gpu/rocm/README.md @@ -144,7 +144,7 @@ security_opt: This configuration forwards all available GPUs to the container. To use a specific GPU, specify its `cardN` and `renderN` device IDs. For example: -````yaml +```yaml shm_size: 1g devices: - /dev/kfd:/dev/kfd @@ -156,7 +156,7 @@ group_add: - video security_opt: - seccomp:unconfined -```` +``` **How to Identify GPU Device IDs:** Use AMD GPU driver utilities to determine the correct `cardN` and `renderN` IDs for your GPU. @@ -170,7 +170,7 @@ Use AMD GPU driver utilities to determine the correct `cardN` and `renderN` IDs ```bash ### Replace the string 'your_huggingfacehub_token' with your HuggingFacehub repository access token. export HUGGINGFACEHUB_API_TOKEN='your_huggingfacehub_token' -```` +``` #### Set variables value in set_env\*\*\*\*.sh file: