Skip to content

Adding files to deploy MultimodalQnA application on ROCm vLLM #1737

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 35 commits into from
Apr 10, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
35 commits
Select commit Hold shift + click to select a range
152ead0
Init changes for MultimodalQnA for vLLM
artem-astafev Apr 1, 2025
4fa7a40
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Apr 1, 2025
9732976
update Readme.md and test for vllm
artem-astafev Apr 2, 2025
f42da6a
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Apr 2, 2025
690432d
Update README.md
artem-astafev Apr 2, 2025
2adb675
Merge branch 'feature/MultimodalQnA_vLLM' of https://github.com/artem…
artem-astafev Apr 2, 2025
f2034ca
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Apr 2, 2025
e16aaf5
Update README.md
artem-astafev Apr 2, 2025
b552fe1
Update Reamde.md and tests for vllm
artem-astafev Apr 2, 2025
fa37f35
Update test_compose_vllm_on_rocm.sh
artem-astafev Apr 2, 2025
a957538
update readme.md and tests
artem-astafev Apr 2, 2025
57bbbae
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Apr 2, 2025
186f4cd
Update test_compose_vllm_on_rocm.sh
artem-astafev Apr 2, 2025
e5957a9
add ${MODEL_CACHE var
artem-astafev Apr 2, 2025
1a22a79
Merge branch 'main' into feature/MultimodalQnA_vLLM
artem-astafev Apr 2, 2025
b5780af
Update test_compose_vllm_on_rocm.sh
artem-astafev Apr 2, 2025
eb56ceb
Merge branch 'feature/MultimodalQnA_vLLM' of https://github.com/artem…
artem-astafev Apr 2, 2025
e00e685
Update test_compose_vllm_on_rocm.sh
artem-astafev Apr 2, 2025
290e370
Update README.md
artem-astafev Apr 2, 2025
857cca1
Update test_compose_on_rocm.sh
artem-astafev Apr 2, 2025
b951830
Update test_compose_on_rocm.sh
artem-astafev Apr 3, 2025
6a9720e
Merge branch 'main' into feature/MultimodalQnA_vLLM
artem-astafev Apr 3, 2025
e3df588
Merge branch 'main' into feature/MultimodalQnA_vLLM
artem-astafev Apr 3, 2025
4bffe57
Add containers ready check
artem-astafev Apr 3, 2025
451596a
Update test_compose_on_rocm.sh
artem-astafev Apr 3, 2025
295ea86
Merge branch 'opea-project:main' into feature/MultimodalQnA_vLLM
artem-astafev Apr 3, 2025
168f9a1
Update test_compose_on_rocm.sh
artem-astafev Apr 3, 2025
b8cc224
update MODEL_CACHE var for tests
artem-astafev Apr 4, 2025
062f0f0
Merge branch 'main' into feature/MultimodalQnA_vLLM
artem-astafev Apr 7, 2025
f5ba94a
Merge branch 'main' into feature/MultimodalQnA_vLLM
artem-astafev Apr 8, 2025
19c63ae
Merge branch 'main' into feature/MultimodalQnA_vLLM
artem-astafev Apr 8, 2025
ede24b5
Merge branch 'main' into feature/MultimodalQnA_vLLM
artem-astafev Apr 9, 2025
00878ff
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Apr 9, 2025
f6a8cad
Update README.md
artem-astafev Apr 9, 2025
1ff771c
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Apr 9, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
530 changes: 367 additions & 163 deletions MultimodalQnA/docker_compose/amd/gpu/rocm/README.md

Large diffs are not rendered by default.

4 changes: 2 additions & 2 deletions MultimodalQnA/docker_compose/amd/gpu/rocm/compose.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -105,7 +105,7 @@ services:
HUGGINGFACEHUB_API_TOKEN: ${MULTIMODAL_HUGGINGFACEHUB_API_TOKEN}
HUGGING_FACE_HUB_TOKEN: ${MULTIMODAL_HUGGINGFACEHUB_API_TOKEN}
volumes:
- "/var/opea/multimodalqna-service/data:/data"
- "${MODEL_CACHE:-./data}:/data"
shm_size: 64g
devices:
- /dev/kfd:/dev/kfd
Expand Down Expand Up @@ -156,7 +156,7 @@ services:
MM_EMBEDDING_PORT_MICROSERVICE: ${MM_EMBEDDING_PORT_MICROSERVICE}
MM_RETRIEVER_SERVICE_HOST_IP: ${MM_RETRIEVER_SERVICE_HOST_IP}
LVM_SERVICE_HOST_IP: ${LVM_SERVICE_HOST_IP}
WHISPER_SERVER_PORT: ${WHISPER_SERVER_PORT}
WHISPER_SERVER_PORT: ${WHISPER_PORT}
WHISPER_SERVER_ENDPOINT: ${WHISPER_SERVER_ENDPOINT}
ipc: host
restart: always
Expand Down
187 changes: 187 additions & 0 deletions MultimodalQnA/docker_compose/amd/gpu/rocm/compose_vllm.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,187 @@
# Copyright (C) 2024 Advanced Micro Devices, Inc.
# SPDX-License-Identifier: Apache-2.0

services:
whisper-service:
image: ${REGISTRY:-opea}/whisper:${TAG:-latest}
container_name: whisper-service
ports:
- "7066:7066"
ipc: host
environment:
no_proxy: ${no_proxy}
http_proxy: ${http_proxy}
https_proxy: ${https_proxy}
restart: unless-stopped
redis-vector-db:
image: redis/redis-stack:7.2.0-v9
container_name: redis-vector-db
ports:
- "6379:6379"
- "8001:8001"
dataprep-multimodal-redis:
image: ${REGISTRY:-opea}/dataprep:${TAG:-latest}
container_name: dataprep-multimodal-redis
depends_on:
- redis-vector-db
- lvm
ports:
- "6007:5000"
environment:
no_proxy: ${no_proxy}
http_proxy: ${http_proxy}
https_proxy: ${https_proxy}
REDIS_URL: ${REDIS_URL}
REDIS_HOST: ${REDIS_HOST}
INDEX_NAME: ${INDEX_NAME}
LVM_ENDPOINT: "http://${LVM_SERVICE_HOST_IP}:9399/v1/lvm"
HUGGINGFACEHUB_API_TOKEN: ${MULTIMODAL_HUGGINGFACEHUB_API_TOKEN}
MULTIMODAL_DATAPREP: true
DATAPREP_COMPONENT_NAME: "OPEA_DATAPREP_MULTIMODALREDIS"
restart: unless-stopped
embedding-multimodal-bridgetower:
image: ${REGISTRY:-opea}/embedding-multimodal-bridgetower:${TAG:-latest}
container_name: embedding-multimodal-bridgetower
ports:
- ${EMBEDDER_PORT}:${EMBEDDER_PORT}
environment:
no_proxy: ${no_proxy}
http_proxy: ${http_proxy}
https_proxy: ${https_proxy}
PORT: ${EMBEDDER_PORT}
healthcheck:
test: ["CMD-SHELL", "http_proxy='' curl -f http://localhost:${EMBEDDER_PORT}/v1/health_check"]
interval: 10s
timeout: 6s
retries: 18
start_period: 30s
entrypoint: ["python", "bridgetower_server.py", "--device", "cpu", "--model_name_or_path", $EMBEDDING_MODEL_ID]
restart: unless-stopped
embedding:
image: ${REGISTRY:-opea}/embedding:${TAG:-latest}
container_name: embedding
depends_on:
embedding-multimodal-bridgetower:
condition: service_healthy
ports:
- ${MM_EMBEDDING_PORT_MICROSERVICE}:${MM_EMBEDDING_PORT_MICROSERVICE}
ipc: host
environment:
no_proxy: ${no_proxy}
http_proxy: ${http_proxy}
https_proxy: ${https_proxy}
MMEI_EMBEDDING_ENDPOINT: ${MMEI_EMBEDDING_ENDPOINT}
MM_EMBEDDING_PORT_MICROSERVICE: ${MM_EMBEDDING_PORT_MICROSERVICE}
MULTIMODAL_EMBEDDING: true
restart: unless-stopped
retriever-redis:
image: ${REGISTRY:-opea}/retriever:${TAG:-latest}
container_name: retriever-redis
depends_on:
- redis-vector-db
ports:
- "7000:7000"
ipc: host
environment:
no_proxy: ${no_proxy}
http_proxy: ${http_proxy}
https_proxy: ${https_proxy}
REDIS_URL: ${REDIS_URL}
INDEX_NAME: ${INDEX_NAME}
BRIDGE_TOWER_EMBEDDING: ${BRIDGE_TOWER_EMBEDDING}
LOGFLAG: ${LOGFLAG}
RETRIEVER_COMPONENT_NAME: "OPEA_RETRIEVER_REDIS"
restart: unless-stopped
multimodalqna-vllm-service:
image: ${REGISTRY:-opea}/vllm-rocm:${TAG:-latest}
container_name: multimodalqna-vllm-service
ports:
- "${MULTIMODAL_VLLM_SERVICE_PORT:-8081}:8011"
environment:
no_proxy: ${no_proxy}
http_proxy: ${http_proxy}
https_proxy: ${https_proxy}
HUGGINGFACEHUB_API_TOKEN: ${MULTIMODAL_HUGGINGFACEHUB_API_TOKEN}
HF_TOKEN: ${MULTIMODAL_HUGGINGFACEHUB_API_TOKEN}
HF_HUB_DISABLE_PROGRESS_BARS: 1
HF_HUB_ENABLE_HF_TRANSFER: 0
WILM_USE_TRITON_FLASH_ATTENTION: 0
PYTORCH_JIT: 0
volumes:
- "${MODEL_CACHE:-./data}:/data"
shm_size: 20G
devices:
- /dev/kfd:/dev/kfd
- /dev/dri/:/dev/dri/
cap_add:
- SYS_PTRACE
group_add:
- video
security_opt:
- seccomp:unconfined
- apparmor=unconfined
command: "--model ${MULTIMODAL_LLM_MODEL_ID} --swap-space 16 --disable-log-requests --dtype float16 --tensor-parallel-size 1 --host 0.0.0.0 --port 8011 --num-scheduler-steps 1 --distributed-executor-backend \"mp\""
ipc: host
lvm:
image: ${REGISTRY:-opea}/lvm:${TAG:-latest}
container_name: lvm
depends_on:
- multimodalqna-vllm-service
ports:
- "9399:9399"
ipc: host
environment:
no_proxy: ${no_proxy}
http_proxy: ${http_proxy}
https_proxy: ${https_proxy}
LVM_COMPONENT_NAME: "OPEA_VLLM_LVM"
LVM_ENDPOINT: ${LVM_ENDPOINT}
LLM_MODEL_ID: ${MULTIMODAL_LLM_MODEL_ID}
HF_HUB_DISABLE_PROGRESS_BARS: 1
HF_HUB_ENABLE_HF_TRANSFER: 0
restart: unless-stopped
multimodalqna:
image: ${REGISTRY:-opea}/multimodalqna:${TAG:-latest}
container_name: multimodalqna-backend-server
depends_on:
- redis-vector-db
- dataprep-multimodal-redis
- embedding
- retriever-redis
- lvm
ports:
- "8888:8888"
environment:
no_proxy: ${no_proxy}
https_proxy: ${https_proxy}
http_proxy: ${http_proxy}
MEGA_SERVICE_HOST_IP: ${MEGA_SERVICE_HOST_IP}
MM_EMBEDDING_SERVICE_HOST_IP: ${MM_EMBEDDING_SERVICE_HOST_IP}
MM_EMBEDDING_PORT_MICROSERVICE: ${MM_EMBEDDING_PORT_MICROSERVICE}
MM_RETRIEVER_SERVICE_HOST_IP: ${MM_RETRIEVER_SERVICE_HOST_IP}
LVM_SERVICE_HOST_IP: ${LVM_SERVICE_HOST_IP}
WHISPER_SERVER_PORT: ${WHISPER_PORT}
WHISPER_SERVER_ENDPOINT: ${WHISPER_SERVER_ENDPOINT}
ipc: host
restart: always
multimodalqna-ui:
image: ${REGISTRY:-opea}/multimodalqna-ui:${TAG:-latest}
container_name: multimodalqna-gradio-ui-server
depends_on:
- multimodalqna
ports:
- "5173:5173"
environment:
- no_proxy=${no_proxy}
- https_proxy=${https_proxy}
- http_proxy=${http_proxy}
- BACKEND_SERVICE_ENDPOINT=${BACKEND_SERVICE_ENDPOINT}
- DATAPREP_INGEST_SERVICE_ENDPOINT=${DATAPREP_INGEST_SERVICE_ENDPOINT}
- DATAPREP_GEN_TRANSCRIPT_SERVICE_ENDPOINT=${DATAPREP_GEN_TRANSCRIPT_SERVICE_ENDPOINT}
- DATAPREP_GEN_CAPTION_SERVICE_ENDPOINT=${DATAPREP_GEN_CAPTION_SERVICE_ENDPOINT}
ipc: host
restart: always

networks:
default:
driver: bridge
2 changes: 2 additions & 0 deletions MultimodalQnA/docker_compose/amd/gpu/rocm/set_env.sh
Original file line number Diff line number Diff line change
Expand Up @@ -31,3 +31,5 @@ export DATAPREP_GEN_TRANSCRIPT_SERVICE_ENDPOINT="http://${HOST_IP}:6007/v1/datap
export DATAPREP_GEN_CAPTION_SERVICE_ENDPOINT="http://${HOST_IP}:6007/v1/dataprep/generate_captions"
export DATAPREP_GET_FILE_ENDPOINT="http://${HOST_IP}:6007/v1/dataprep/get"
export DATAPREP_DELETE_FILE_ENDPOINT="http://${HOST_IP}:6007/v1/dataprep/delete"
export WHISPER_PORT="7066"
export WHISPER_SERVER_ENDPOINT="http://${host_ip}:${WHISPER_PORT}/v1/asr"
35 changes: 35 additions & 0 deletions MultimodalQnA/docker_compose/amd/gpu/rocm/set_env_vllm.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
#!/usr/bin/env bash

# Copyright (C) 2024 Advanced Micro Devices, Inc.
# SPDX-License-Identifier: Apache-2.0

export HOST_IP=${your_host_ip_address}
export MULTIMODAL_HUGGINGFACEHUB_API_TOKEN=${your_huggingfacehub_token}
export MULTIMODAL_TGI_SERVICE_PORT="8399"
export no_proxy=${your_no_proxy}
export http_proxy=${your_http_proxy}
export https_proxy=${your_http_proxy}
export BRIDGE_TOWER_EMBEDDING=true
export EMBEDDER_PORT=6006
export MMEI_EMBEDDING_ENDPOINT="http://${HOST_IP}:$EMBEDDER_PORT"
export MM_EMBEDDING_PORT_MICROSERVICE=6000
export REDIS_URL="redis://${HOST_IP}:6379"
export REDIS_HOST=${HOST_IP}
export INDEX_NAME="mm-rag-redis"
export VLLM_SERVER_PORT=8081
export LVM_ENDPOINT="http://${HOST_IP}:${VLLM_SERVER_PORT}"
export EMBEDDING_MODEL_ID="BridgeTower/bridgetower-large-itm-mlm-itc"
export LVM_MODEL_ID="Xkev/Llama-3.2V-11B-cot"
export WHISPER_MODEL="base"
export MM_EMBEDDING_SERVICE_HOST_IP=${HOST_IP}
export MM_RETRIEVER_SERVICE_HOST_IP=${HOST_IP}
export LVM_SERVICE_HOST_IP=${HOST_IP}
export MEGA_SERVICE_HOST_IP=${HOST_IP}
export BACKEND_SERVICE_ENDPOINT="http://${HOST_IP}:8888/v1/multimodalqna"
export DATAPREP_INGEST_SERVICE_ENDPOINT="http://${HOST_IP}:6007/v1/dataprep/ingest"
export DATAPREP_GEN_TRANSCRIPT_SERVICE_ENDPOINT="http://${HOST_IP}:6007/v1/dataprep/generate_transcripts"
export DATAPREP_GEN_CAPTION_SERVICE_ENDPOINT="http://${HOST_IP}:6007/v1/dataprep/generate_captions"
export DATAPREP_GET_FILE_ENDPOINT="http://${HOST_IP}:6007/v1/dataprep/get"
export DATAPREP_DELETE_FILE_ENDPOINT="http://${HOST_IP}:6007/v1/dataprep/delete"
export WHISPER_PORT="7066"
export WHISPER_SERVER_ENDPOINT="http://${host_ip}:${WHISPER_PORT}/v1/asr"
5 changes: 5 additions & 0 deletions MultimodalQnA/docker_image_build/build.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -77,3 +77,8 @@ services:
dockerfile: comps/tts/src/Dockerfile
extends: multimodalqna
image: ${REGISTRY:-opea}/tts:${TAG:-latest}
vllm-rocm:
build:
context: GenAIComps
dockerfile: comps/third_parties/vllm/src/Dockerfile.amd_gpu
image: ${REGISTRY:-opea}/vllm-rocm:${TAG:-latest}
11 changes: 10 additions & 1 deletion MultimodalQnA/tests/test_compose_on_rocm.sh
Original file line number Diff line number Diff line change
Expand Up @@ -72,12 +72,21 @@ function setup_env() {
export DATAPREP_GEN_CAPTION_SERVICE_ENDPOINT="http://${HOST_IP}:6007/v1/dataprep/generate_captions"
export DATAPREP_GET_FILE_ENDPOINT="http://${HOST_IP}:6007/v1/dataprep/get"
export DATAPREP_DELETE_FILE_ENDPOINT="http://${HOST_IP}:6007/v1/dataprep/delete"
export MODEL_CACHE=${model_cache:-"/var/opea/multimodalqna-service/data"}
}

function start_services() {
cd $WORKPATH/docker_compose/amd/gpu/rocm
docker compose -f compose.yaml up -d > ${LOG_PATH}/start_services_with_compose.log
sleep 1m
n=0
until [[ "$n" -ge 100 ]]; do
docker logs tgi-llava-rocm-server >& $LOG_PATH/tgi-llava-rocm-server_start.log
if grep -q "Connected" $LOG_PATH/tgi-llava-rocm-server_start.log; then
break
fi
sleep 10s
n=$((n+1))
done
}

function prepare_data() {
Expand Down
Loading