Skip to content

Commit c760cac

Browse files
mhbuehlerokhleif-ILdmsuehirpre-commit-ci[bot]ashahba
authored
Adds audio querying to MultimodalQ&A Example (#1225)
Signed-off-by: Melanie Buehler <melanie.h.buehler@intel.com> Signed-off-by: okhleif-IL <omar.khleif@intel.com> Signed-off-by: dmsuehir <dina.s.jones@intel.com> Co-authored-by: Omar Khleif <omar.khleif@intel.com> Co-authored-by: Dina Suehiro Jones <dina.s.jones@intel.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Abolfazl Shahbazi <12436063+ashahba@users.noreply.github.com>
1 parent a50e4e6 commit c760cac

File tree

13 files changed

+389
-112
lines changed

13 files changed

+389
-112
lines changed

MultimodalQnA/docker_compose/intel/cpu/xeon/README.md

Lines changed: 46 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -78,6 +78,9 @@ export https_proxy=${your_http_proxy}
7878
export EMBEDDER_PORT=6006
7979
export MMEI_EMBEDDING_ENDPOINT="http://${host_ip}:$EMBEDDER_PORT/v1/encode"
8080
export MM_EMBEDDING_PORT_MICROSERVICE=6000
81+
export ASR_ENDPOINT=http://$host_ip:7066
82+
export ASR_SERVICE_PORT=3001
83+
export ASR_SERVICE_ENDPOINT="http://${host_ip}:${ASR_SERVICE_PORT}/v1/audio/transcriptions"
8184
export REDIS_URL="redis://${host_ip}:6379"
8285
export REDIS_HOST=${host_ip}
8386
export INDEX_NAME="mm-rag-redis"
@@ -144,7 +147,21 @@ docker build --no-cache -t opea/lvm-llava-svc:latest --build-arg https_proxy=$ht
144147
docker build --no-cache -t opea/dataprep-multimodal-redis:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/dataprep/multimodal/redis/langchain/Dockerfile .
145148
```
146149

147-
### 5. Build MegaService Docker Image
150+
### 5. Build asr images
151+
152+
Build whisper server image
153+
154+
```bash
155+
docker build --no-cache -t opea/whisper:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/asr/whisper/dependency/Dockerfile .
156+
```
157+
158+
Build asr image
159+
160+
```bash
161+
docker build --no-cache -t opea/asr:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/asr/whisper/Dockerfile .
162+
```
163+
164+
### 6. Build MegaService Docker Image
148165

149166
To construct the Mega Service, we utilize the [GenAIComps](https://github.com/opea-project/GenAIComps.git) microservice pipeline within the [multimodalqna.py](../../../../multimodalqna.py) Python script. Build MegaService Docker image via below command:
150167

@@ -155,7 +172,7 @@ docker build --no-cache -t opea/multimodalqna:latest --build-arg https_proxy=$ht
155172
cd ../..
156173
```
157174

158-
### 6. Build UI Docker Image
175+
### 7. Build UI Docker Image
159176

160177
Build frontend Docker image via below command:
161178

@@ -165,16 +182,19 @@ docker build --no-cache -t opea/multimodalqna-ui:latest --build-arg https_proxy=
165182
cd ../../../
166183
```
167184

168-
Then run the command `docker images`, you will have the following 8 Docker Images:
185+
Then run the command `docker images`, you will have the following 11 Docker Images:
169186

170187
1. `opea/dataprep-multimodal-redis:latest`
171188
2. `opea/lvm-llava-svc:latest`
172189
3. `opea/lvm-llava:latest`
173190
4. `opea/retriever-multimodal-redis:latest`
174-
5. `opea/embedding-multimodal:latest`
175-
6. `opea/embedding-multimodal-bridgetower:latest`
176-
7. `opea/multimodalqna:latest`
177-
8. `opea/multimodalqna-ui:latest`
191+
5. `opea/whisper:latest`
192+
6. `opea/asr:latest`
193+
7. `opea/redis-vector-db`
194+
8. `opea/embedding-multimodal:latest`
195+
9. `opea/embedding-multimodal-bridgetower:latest`
196+
10. `opea/multimodalqna:latest`
197+
11. `opea/multimodalqna-ui:latest`
178198

179199
## 🚀 Start Microservices
180200

@@ -240,7 +260,16 @@ curl http://${host_ip}:7000/v1/multimodal_retrieval \
240260
-d "{\"text\":\"test\",\"embedding\":${your_embedding}}"
241261
```
242262

243-
4. lvm-llava
263+
4. asr
264+
265+
```bash
266+
curl ${ASR_SERVICE_ENDPOINT} \
267+
-X POST \
268+
-H "Content-Type: application/json" \
269+
-d '{"byte_str" : "UklGRigAAABXQVZFZm10IBIAAAABAAEARKwAAIhYAQACABAAAABkYXRhAgAAAAEA"}'
270+
```
271+
272+
5. lvm-llava
244273

245274
```bash
246275
curl http://${host_ip}:${LLAVA_SERVER_PORT}/generate \
@@ -249,7 +278,7 @@ curl http://${host_ip}:${LLAVA_SERVER_PORT}/generate \
249278
-d '{"prompt":"Describe the image please.", "img_b64_str": "iVBORw0KGgoAAAANSUhEUgAAAAoAAAAKCAYAAACNMs+9AAAAFUlEQVR42mP8/5+hnoEIwDiqkL4KAcT9GO0U4BxoAAAAAElFTkSuQmCC"}'
250279
```
251280

252-
5. lvm-llava-svc
281+
6. lvm-llava-svc
253282

254283
```bash
255284
curl http://${host_ip}:9399/v1/lvm \
@@ -274,7 +303,7 @@ curl http://${host_ip}:9399/v1/lvm \
274303
-d '{"retrieved_docs": [], "initial_query": "What is this?", "top_n": 1, "metadata": [], "chat_template":"The caption of the image is: '\''{context}'\''. {question}"}'
275304
```
276305

277-
6. dataprep-multimodal-redis
306+
7. dataprep-multimodal-redis
278307

279308
Download a sample video, image, and audio file and create a caption
280309

@@ -348,7 +377,7 @@ curl -X POST \
348377
${DATAPREP_DELETE_FILE_ENDPOINT}
349378
```
350379
351-
7. MegaService
380+
8. MegaService
352381
353382
```bash
354383
curl http://${host_ip}:8888/v1/multimodalqna \
@@ -357,6 +386,12 @@ curl http://${host_ip}:8888/v1/multimodalqna \
357386
-d '{"messages": "What is the revenue of Nike in 2023?"}'
358387
```
359388
389+
```bash
390+
curl http://${host_ip}:8888/v1/multimodalqna \
391+
-H "Content-Type: application/json" \
392+
-d '{"messages": [{"role": "user", "content": [{"type": "audio", "audio": "UklGRigAAABXQVZFZm10IBIAAAABAAEARKwAAIhYAQACABAAAABkYXRhAgAAAAEA"}]}]}'
393+
```
394+
360395
```bash
361396
curl http://${host_ip}:8888/v1/multimodalqna \
362397
-H "Content-Type: application/json" \

MultimodalQnA/docker_compose/intel/cpu/xeon/compose.yaml

Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,27 @@
22
# SPDX-License-Identifier: Apache-2.0
33

44
services:
5+
whisper-service:
6+
image: ${REGISTRY:-opea}/whisper:${TAG:-latest}
7+
container_name: whisper-service
8+
ports:
9+
- "7066:7066"
10+
ipc: host
11+
environment:
12+
no_proxy: ${no_proxy}
13+
http_proxy: ${http_proxy}
14+
https_proxy: ${https_proxy}
15+
restart: unless-stopped
16+
asr:
17+
image: ${REGISTRY:-opea}/asr:${TAG:-latest}
18+
container_name: asr-service
19+
ports:
20+
- "${ASR_SERVICE_PORT}:9099"
21+
ipc: host
22+
environment:
23+
ASR_ENDPOINT: ${ASR_ENDPOINT}
24+
ASR_SERVICE_PORT: ${ASR_SERVICE_PORT}
25+
ASR_SERVICE_ENDPOINT: ${ASR_SERVICE_ENDPOINT}
526
redis-vector-db:
627
image: redis/redis-stack:7.2.0-v9
728
container_name: redis-vector-db
@@ -102,6 +123,7 @@ services:
102123
- embedding-multimodal
103124
- retriever-multimodal-redis
104125
- lvm-llava-svc
126+
- asr
105127
ports:
106128
- "8888:8888"
107129
environment:
@@ -113,6 +135,8 @@ services:
113135
MM_EMBEDDING_PORT_MICROSERVICE: ${MM_EMBEDDING_PORT_MICROSERVICE}
114136
MM_RETRIEVER_SERVICE_HOST_IP: ${MM_RETRIEVER_SERVICE_HOST_IP}
115137
LVM_SERVICE_HOST_IP: ${LVM_SERVICE_HOST_IP}
138+
ASR_SERVICE_PORT: ${ASR_SERVICE_PORT}
139+
ASR_SERVICE_ENDPOINT: ${ASR_SERVICE_ENDPOINT}
116140
ipc: host
117141
restart: always
118142
multimodalqna-ui:

MultimodalQnA/docker_compose/intel/cpu/xeon/set_env.sh

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,9 @@ export https_proxy=${your_http_proxy}
1212
export EMBEDDER_PORT=6006
1313
export MMEI_EMBEDDING_ENDPOINT="http://${host_ip}:$EMBEDDER_PORT/v1/encode"
1414
export MM_EMBEDDING_PORT_MICROSERVICE=6000
15+
export ASR_ENDPOINT=http://$host_ip:7066
16+
export ASR_SERVICE_PORT=3001
17+
export ASR_SERVICE_ENDPOINT="http://${host_ip}:${ASR_SERVICE_PORT}/v1/audio/transcriptions"
1518
export REDIS_URL="redis://${host_ip}:6379"
1619
export REDIS_HOST=${host_ip}
1720
export INDEX_NAME="mm-rag-redis"

MultimodalQnA/docker_compose/intel/hpu/gaudi/README.md

Lines changed: 39 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -37,6 +37,9 @@ export LVM_MODEL_ID="llava-hf/llava-v1.6-vicuna-13b-hf"
3737
export WHISPER_MODEL="base"
3838
export MM_EMBEDDING_SERVICE_HOST_IP=${host_ip}
3939
export MM_RETRIEVER_SERVICE_HOST_IP=${host_ip}
40+
export ASR_ENDPOINT=http://$host_ip:7066
41+
export ASR_SERVICE_PORT=3001
42+
export ASR_SERVICE_ENDPOINT="http://${host_ip}:${ASR_SERVICE_PORT}/v1/audio/transcriptions"
4043
export LVM_SERVICE_HOST_IP=${host_ip}
4144
export MEGA_SERVICE_HOST_IP=${host_ip}
4245
export BACKEND_SERVICE_ENDPOINT="http://${host_ip}:8888/v1/multimodalqna"
@@ -95,7 +98,21 @@ docker build --no-cache -t opea/lvm-tgi:latest --build-arg https_proxy=$https_pr
9598
docker build --no-cache -t opea/dataprep-multimodal-redis:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/dataprep/multimodal/redis/langchain/Dockerfile .
9699
```
97100

98-
### 5. Build MegaService Docker Image
101+
### 5. Build asr images
102+
103+
Build whisper server image
104+
105+
```bash
106+
docker build --no-cache -t opea/whisper:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/asr/whisper/dependency/Dockerfile .
107+
```
108+
109+
Build asr image
110+
111+
```bash
112+
docker build --no-cache -t opea/asr:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/asr/whisper/Dockerfile .
113+
```
114+
115+
### 6. Build MegaService Docker Image
99116

100117
To construct the Mega Service, we utilize the [GenAIComps](https://github.com/opea-project/GenAIComps.git) microservice pipeline within the [multimodalqna.py](../../../../multimodalqna.py) Python script. Build MegaService Docker image via below command:
101118

@@ -114,16 +131,19 @@ cd GenAIExamples/MultimodalQnA/ui/
114131
docker build --no-cache -t opea/multimodalqna-ui:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f ./docker/Dockerfile .
115132
```
116133

117-
Then run the command `docker images`, you will have the following 8 Docker Images:
134+
Then run the command `docker images`, you will have the following 11 Docker Images:
118135

119136
1. `opea/dataprep-multimodal-redis:latest`
120137
2. `opea/lvm-tgi:latest`
121138
3. `ghcr.io/huggingface/tgi-gaudi:2.0.6`
122139
4. `opea/retriever-multimodal-redis:latest`
123-
5. `opea/embedding-multimodal:latest`
124-
6. `opea/embedding-multimodal-bridgetower:latest`
125-
7. `opea/multimodalqna:latest`
126-
8. `opea/multimodalqna-ui:latest`
140+
5. `opea/whisper:latest`
141+
6. `opea/asr:latest`
142+
7. `opea/redis-vector-db`
143+
8. `opea/embedding-multimodal:latest`
144+
9. `opea/embedding-multimodal-bridgetower:latest`
145+
10. `opea/multimodalqna:latest`
146+
11. `opea/multimodalqna-ui:latest`
127147

128148
## 🚀 Start Microservices
129149

@@ -189,7 +209,16 @@ curl http://${host_ip}:7000/v1/multimodal_retrieval \
189209
-d "{\"text\":\"test\",\"embedding\":${your_embedding}}"
190210
```
191211

192-
4. TGI LLaVA Gaudi Server
212+
4. asr
213+
214+
```bash
215+
curl ${ASR_SERVICE_ENDPOINT} \
216+
-X POST \
217+
-H "Content-Type: application/json" \
218+
-d '{"byte_str" : "UklGRigAAABXQVZFZm10IBIAAAABAAEARKwAAIhYAQACABAAAABkYXRhAgAAAAEA"}'
219+
```
220+
221+
5. TGI LLaVA Gaudi Server
193222

194223
```bash
195224
curl http://${host_ip}:${LLAVA_SERVER_PORT}/generate \
@@ -198,7 +227,7 @@ curl http://${host_ip}:${LLAVA_SERVER_PORT}/generate \
198227
-H 'Content-Type: application/json'
199228
```
200229

201-
5. lvm-tgi
230+
6. lvm-tgi
202231

203232
```bash
204233
curl http://${host_ip}:9399/v1/lvm \
@@ -223,7 +252,7 @@ curl http://${host_ip}:9399/v1/lvm \
223252
-d '{"retrieved_docs": [], "initial_query": "What is this?", "top_n": 1, "metadata": [], "chat_template":"The caption of the image is: '\''{context}'\''. {question}"}'
224253
```
225254

226-
6. Multimodal Dataprep Microservice
255+
7. Multimodal Dataprep Microservice
227256

228257
Download a sample video, image, and audio file and create a caption
229258

@@ -297,7 +326,7 @@ curl -X POST \
297326
${DATAPREP_DELETE_FILE_ENDPOINT}
298327
```
299328
300-
7. MegaService
329+
8. MegaService
301330
302331
```bash
303332
curl http://${host_ip}:8888/v1/multimodalqna \

MultimodalQnA/docker_compose/intel/hpu/gaudi/compose.yaml

Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,27 @@ services:
88
ports:
99
- "6379:6379"
1010
- "8001:8001"
11+
whisper-service:
12+
image: ${REGISTRY:-opea}/whisper:${TAG:-latest}
13+
container_name: whisper-service
14+
ports:
15+
- "7066:7066"
16+
ipc: host
17+
environment:
18+
no_proxy: ${no_proxy}
19+
http_proxy: ${http_proxy}
20+
https_proxy: ${https_proxy}
21+
restart: unless-stopped
22+
asr:
23+
image: ${REGISTRY:-opea}/asr:${TAG:-latest}
24+
container_name: asr-service
25+
ports:
26+
- "${ASR_SERVICE_PORT}:9099"
27+
ipc: host
28+
environment:
29+
ASR_ENDPOINT: ${ASR_ENDPOINT}
30+
ASR_SERVICE_PORT: ${ASR_SERVICE_PORT}
31+
ASR_SERVICE_ENDPOINT: ${ASR_SERVICE_ENDPOINT}
1132
dataprep-multimodal-redis:
1233
image: ${REGISTRY:-opea}/dataprep-multimodal-redis:${TAG:-latest}
1334
container_name: dataprep-multimodal-redis
@@ -119,6 +140,7 @@ services:
119140
- embedding-multimodal
120141
- retriever-multimodal-redis
121142
- lvm-tgi
143+
- asr
122144
ports:
123145
- "8888:8888"
124146
environment:
@@ -130,6 +152,8 @@ services:
130152
MM_EMBEDDING_PORT_MICROSERVICE: ${MM_EMBEDDING_PORT_MICROSERVICE}
131153
MM_RETRIEVER_SERVICE_HOST_IP: ${MM_RETRIEVER_SERVICE_HOST_IP}
132154
LVM_SERVICE_HOST_IP: ${LVM_SERVICE_HOST_IP}
155+
ASR_SERVICE_PORT: ${ASR_SERVICE_PORT}
156+
ASR_SERVICE_ENDPOINT: ${ASR_SERVICE_ENDPOINT}
133157
ipc: host
134158
restart: always
135159
multimodalqna-ui:

MultimodalQnA/docker_compose/intel/hpu/gaudi/set_env.sh

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,9 @@ export https_proxy=${your_http_proxy}
1212
export EMBEDDER_PORT=6006
1313
export MMEI_EMBEDDING_ENDPOINT="http://${host_ip}:$EMBEDDER_PORT/v1/encode"
1414
export MM_EMBEDDING_PORT_MICROSERVICE=6000
15+
export ASR_ENDPOINT=http://$host_ip:7066
16+
export ASR_SERVICE_PORT=3001
17+
export ASR_SERVICE_ENDPOINT="http://${host_ip}:${ASR_SERVICE_PORT}/v1/audio/transcriptions"
1518
export REDIS_URL="redis://${host_ip}:6379"
1619
export REDIS_HOST=${host_ip}
1720
export INDEX_NAME="mm-rag-redis"

MultimodalQnA/docker_image_build/build.yaml

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -59,3 +59,15 @@ services:
5959
dockerfile: comps/dataprep/multimodal/redis/langchain/Dockerfile
6060
extends: multimodalqna
6161
image: ${REGISTRY:-opea}/dataprep-multimodal-redis:${TAG:-latest}
62+
whisper:
63+
build:
64+
context: GenAIComps
65+
dockerfile: comps/asr/whisper/dependency/Dockerfile
66+
extends: multimodalqna
67+
image: ${REGISTRY:-opea}/whisper:${TAG:-latest}
68+
asr:
69+
build:
70+
context: GenAIComps
71+
dockerfile: comps/asr/whisper/Dockerfile
72+
extends: multimodalqna
73+
image: ${REGISTRY:-opea}/asr:${TAG:-latest}

0 commit comments

Comments
 (0)