Skip to content

Commit fdbc27a

Browse files
AvatarChatbot - Adding files to deploy AvatarChatbot application on AMD GPU (opea-project#1288)
Signed-off-by: Chingis Yundunov <YundunovCN@sibedge.com>
1 parent 5f4b182 commit fdbc27a

File tree

4 files changed

+584
-0
lines changed

4 files changed

+584
-0
lines changed
Lines changed: 209 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,209 @@
1+
# Build Mega Service of AvatarChatbot on AMD GPU
2+
3+
This document outlines the deployment process for a AvatarChatbot application utilizing the [GenAIComps](https://github.com/opea-project/GenAIComps.git) microservice pipeline on Intel Xeon server.
4+
5+
## 🚀 Build Docker images
6+
7+
### 1. Source Code install GenAIComps
8+
9+
```bash
10+
git clone https://github.com/opea-project/GenAIComps.git
11+
cd GenAIComps
12+
```
13+
14+
### 2. Build ASR Image
15+
16+
```bash
17+
docker build -t opea/whisper:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/asr/src/integrations/dependency/whisper/Dockerfile .
18+
19+
20+
docker build -t opea/asr:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/asr/src/Dockerfile .
21+
```
22+
23+
### 3. Build LLM Image
24+
25+
```bash
26+
docker build --no-cache -t opea/llm-textgen:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/llms/src/text-generation/Dockerfile .
27+
```
28+
29+
### 4. Build TTS Image
30+
31+
```bash
32+
docker build -t opea/speecht5:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/tts/src/integrations/dependency/speecht5/Dockerfile .
33+
34+
docker build -t opea/tts:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/tts/src/Dockerfile .
35+
```
36+
37+
### 5. Build Animation Image
38+
39+
```bash
40+
docker build -t opea/wav2lip:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/third_parties/wav2lip/src/Dockerfile .
41+
42+
docker build -t opea/animation:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/animation/src/Dockerfile .
43+
```
44+
45+
### 6. Build MegaService Docker Image
46+
47+
To construct the Mega Service, we utilize the [GenAIComps](https://github.com/opea-project/GenAIComps.git) microservice pipeline within the `audioqna.py` Python script. Build the MegaService Docker image using the command below:
48+
49+
```bash
50+
git clone https://github.com/opea-project/GenAIExamples.git
51+
cd GenAIExamples/AvatarChatbot/
52+
docker build --no-cache -t opea/avatarchatbot:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f Dockerfile .
53+
```
54+
55+
Then run the command `docker images`, you will have following images ready:
56+
57+
1. `opea/whisper:latest`
58+
2. `opea/asr:latest`
59+
3. `opea/llm-tgi:latest`
60+
4. `opea/speecht5:latest`
61+
5. `opea/tts:latest`
62+
6. `opea/wav2lip:latest`
63+
7. `opea/animation:latest`
64+
8. `opea/avatarchatbot:latest`
65+
66+
## 🚀 Set the environment variables
67+
68+
Before starting the services with `docker compose`, you have to recheck the following environment variables.
69+
70+
```bash
71+
export HUGGINGFACEHUB_API_TOKEN=${HUGGINGFACEHUB_API_TOKEN}
72+
export host_ip=$(hostname -I | awk '{print $1}')
73+
74+
export TGI_SERVICE_PORT=3006
75+
export TGI_LLM_ENDPOINT=http://${host_ip}:${TGI_SERVICE_PORT}
76+
export LLM_MODEL_ID="Intel/neural-chat-7b-v3-3"
77+
78+
export ASR_ENDPOINT=http://${host_ip}:7066
79+
export TTS_ENDPOINT=http://${host_ip}:7055
80+
export WAV2LIP_ENDPOINT=http://${host_ip}:7860
81+
82+
export MEGA_SERVICE_HOST_IP=${host_ip}
83+
export ASR_SERVICE_HOST_IP=${host_ip}
84+
export TTS_SERVICE_HOST_IP=${host_ip}
85+
export LLM_SERVICE_HOST_IP=${host_ip}
86+
export ANIMATION_SERVICE_HOST_IP=${host_ip}
87+
88+
export MEGA_SERVICE_PORT=8888
89+
export ASR_SERVICE_PORT=3001
90+
export TTS_SERVICE_PORT=3002
91+
export LLM_SERVICE_PORT=3007
92+
export ANIMATION_SERVICE_PORT=3008
93+
94+
export DEVICE="cpu"
95+
export WAV2LIP_PORT=7860
96+
export INFERENCE_MODE='wav2lip+gfpgan'
97+
export CHECKPOINT_PATH='/usr/local/lib/python3.11/site-packages/Wav2Lip/checkpoints/wav2lip_gan.pth'
98+
export FACE="assets/img/avatar5.png"
99+
# export AUDIO='assets/audio/eg3_ref.wav' # audio file path is optional, will use base64str in the post request as input if is 'None'
100+
export AUDIO='None'
101+
export FACESIZE=96
102+
export OUTFILE="/outputs/result.mp4"
103+
export GFPGAN_MODEL_VERSION=1.4 # latest version, can roll back to v1.3 if needed
104+
export UPSCALE_FACTOR=1
105+
export FPS=10
106+
```
107+
108+
Warning!!! - The Wav2lip service works in this solution using only the CPU. To use AMD GPUs and achieve operational performance, the Wav2lip image needs to be modified to adapt to AMD hardware and the ROCm framework.
109+
110+
## 🚀 Start the MegaService
111+
112+
```bash
113+
cd GenAIExamples/AvatarChatbot/docker_compose/intel/cpu/xeon/
114+
docker compose -f compose.yaml up -d
115+
```
116+
117+
## 🚀 Test MicroServices
118+
119+
```bash
120+
# whisper service
121+
curl http://${host_ip}:7066/v1/asr \
122+
-X POST \
123+
-d '{"audio": "UklGRigAAABXQVZFZm10IBIAAAABAAEARKwAAIhYAQACABAAAABkYXRhAgAAAAEA"}' \
124+
-H 'Content-Type: application/json'
125+
126+
# asr microservice
127+
curl http://${host_ip}:3001/v1/audio/transcriptions \
128+
-X POST \
129+
-d '{"byte_str": "UklGRigAAABXQVZFZm10IBIAAAABAAEARKwAAIhYAQACABAAAABkYXRhAgAAAAEA"}' \
130+
-H 'Content-Type: application/json'
131+
132+
# tgi service
133+
curl http://${host_ip}:3006/generate \
134+
-X POST \
135+
-d '{"inputs":"What is Deep Learning?","parameters":{"max_new_tokens":17, "do_sample": true}}' \
136+
-H 'Content-Type: application/json'
137+
138+
# llm microservice
139+
curl http://${host_ip}:3007/v1/chat/completions\
140+
-X POST \
141+
-d '{"query":"What is Deep Learning?","max_tokens":17,"top_k":10,"top_p":0.95,"typical_p":0.95,"temperature":0.01,"repetition_penalty":1.03,"streaming":false}' \
142+
-H 'Content-Type: application/json'
143+
144+
# speecht5 service
145+
curl http://${host_ip}:7055/v1/tts \
146+
-X POST \
147+
-d '{"text": "Who are you?"}' \
148+
-H 'Content-Type: application/json'
149+
150+
# tts microservice
151+
curl http://${host_ip}:3002/v1/audio/speech \
152+
-X POST \
153+
-d '{"text": "Who are you?"}' \
154+
-H 'Content-Type: application/json'
155+
156+
# wav2lip service
157+
cd ../../../..
158+
curl http://${host_ip}:7860/v1/wav2lip \
159+
-X POST \
160+
-d @assets/audio/sample_minecraft.json \
161+
-H 'Content-Type: application/json'
162+
163+
# animation microservice
164+
curl http://${host_ip}:3008/v1/animation \
165+
-X POST \
166+
-d @assets/audio/sample_question.json \
167+
-H "Content-Type: application/json"
168+
169+
```
170+
171+
## 🚀 Test MegaService
172+
173+
```bash
174+
curl http://${host_ip}:3009/v1/avatarchatbot \
175+
-X POST \
176+
-d @assets/audio/sample_whoareyou.json \
177+
-H 'Content-Type: application/json'
178+
```
179+
180+
If the megaservice is running properly, you should see the following output:
181+
182+
```bash
183+
"/outputs/result.mp4"
184+
```
185+
186+
The output file will be saved in the current working directory, as `${PWD}` is mapped to `/outputs` inside the wav2lip-service Docker container.
187+
188+
## Gradio UI
189+
190+
```bash
191+
cd $WORKPATH/GenAIExamples/AvatarChatbot
192+
python3 ui/gradio/app_gradio_demo_avatarchatbot.py
193+
```
194+
195+
The UI can be viewed at http://${host_ip}:7861
196+
<img src="../../../../assets/img/UI.png" alt="UI Example" width="60%">
197+
In the current version v1.0, you need to set the avatar figure image/video and the DL model choice in the environment variables before starting AvatarChatbot backend service and running the UI. Please just customize the audio question in the UI.
198+
\*\* We will enable change of avatar figure between runs in v2.0
199+
200+
## Troubleshooting
201+
202+
```bash
203+
cd GenAIExamples/AvatarChatbot/tests
204+
export IMAGE_REPO="opea"
205+
export IMAGE_TAG="latest"
206+
export HUGGINGFACEHUB_API_TOKEN=<your_hf_token>
207+
208+
test_avatarchatbot_on_xeon.sh
209+
```
Lines changed: 158 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,158 @@
1+
2+
# Copyright (C) 2024 Intel Corporation
3+
# SPDX-License-Identifier: Apache-2.0
4+
5+
services:
6+
whisper-service:
7+
image: ${REGISTRY:-opea}/whisper:${TAG:-latest}
8+
container_name: whisper-service
9+
ports:
10+
- "7066:7066"
11+
ipc: host
12+
environment:
13+
no_proxy: ${no_proxy}
14+
http_proxy: ${http_proxy}
15+
https_proxy: ${https_proxy}
16+
restart: unless-stopped
17+
asr:
18+
image: ${REGISTRY:-opea}/asr:${TAG:-latest}
19+
container_name: asr-service
20+
ports:
21+
- "3001:9099"
22+
ipc: host
23+
environment:
24+
ASR_ENDPOINT: ${ASR_ENDPOINT}
25+
speecht5-service:
26+
image: ${REGISTRY:-opea}/speecht5:${TAG:-latest}
27+
container_name: speecht5-service
28+
ports:
29+
- "7055:7055"
30+
ipc: host
31+
environment:
32+
no_proxy: ${no_proxy}
33+
http_proxy: ${http_proxy}
34+
https_proxy: ${https_proxy}
35+
restart: unless-stopped
36+
tts:
37+
image: ${REGISTRY:-opea}/tts:${TAG:-latest}
38+
container_name: tts-service
39+
ports:
40+
- "3002:9088"
41+
ipc: host
42+
environment:
43+
TTS_ENDPOINT: ${TTS_ENDPOINT}
44+
tgi-service:
45+
image: ghcr.io/huggingface/text-generation-inference:2.3.1-rocm
46+
container_name: tgi-service
47+
ports:
48+
- "${TGI_SERVICE_PORT:-3006}:80"
49+
volumes:
50+
- "./data:/data"
51+
environment:
52+
no_proxy: ${no_proxy}
53+
http_proxy: ${http_proxy}
54+
https_proxy: ${https_proxy}
55+
HUGGING_FACE_HUB_TOKEN: ${HUGGINGFACEHUB_API_TOKEN}
56+
HUGGINGFACEHUB_API_TOKEN: ${HUGGINGFACEHUB_API_TOKEN}
57+
shm_size: 1g
58+
devices:
59+
- /dev/kfd:/dev/kfd
60+
- /dev/dri/:/dev/dri/
61+
cap_add:
62+
- SYS_PTRACE
63+
group_add:
64+
- video
65+
security_opt:
66+
- seccomp:unconfined
67+
ipc: host
68+
command: --model-id ${LLM_MODEL_ID} --max-input-length 4096 --max-total-tokens 8192
69+
llm:
70+
image: ${REGISTRY:-opea}/llm-textgen:${TAG:-latest}
71+
container_name: llm-tgi-server
72+
depends_on:
73+
- tgi-service
74+
ports:
75+
- "3007:9000"
76+
ipc: host
77+
environment:
78+
no_proxy: ${no_proxy}
79+
http_proxy: ${http_proxy}
80+
https_proxy: ${https_proxy}
81+
TGI_LLM_ENDPOINT: ${TGI_LLM_ENDPOINT}
82+
LLM_ENDPOINT: ${TGI_LLM_ENDPOINT}
83+
HUGGINGFACEHUB_API_TOKEN: ${HUGGINGFACEHUB_API_TOKEN}
84+
HF_TOKEN: ${HUGGINGFACEHUB_API_TOKEN}
85+
OPENAI_API_KEY: ${OPENAI_API_KEY}
86+
restart: unless-stopped
87+
wav2lip-service:
88+
image: ${REGISTRY:-opea}/wav2lip:${TAG:-latest}
89+
container_name: wav2lip-service
90+
ports:
91+
- "7860:7860"
92+
ipc: host
93+
volumes:
94+
- ${PWD}:/outputs
95+
environment:
96+
no_proxy: ${no_proxy}
97+
http_proxy: ${http_proxy}
98+
https_proxy: ${https_proxy}
99+
DEVICE: ${DEVICE}
100+
INFERENCE_MODE: ${INFERENCE_MODE}
101+
CHECKPOINT_PATH: ${CHECKPOINT_PATH}
102+
FACE: ${FACE}
103+
AUDIO: ${AUDIO}
104+
FACESIZE: ${FACESIZE}
105+
OUTFILE: ${OUTFILE}
106+
GFPGAN_MODEL_VERSION: ${GFPGAN_MODEL_VERSION}
107+
UPSCALE_FACTOR: ${UPSCALE_FACTOR}
108+
FPS: ${FPS}
109+
WAV2LIP_PORT: ${WAV2LIP_PORT}
110+
restart: unless-stopped
111+
animation:
112+
image: ${REGISTRY:-opea}/animation:${TAG:-latest}
113+
container_name: animation-server
114+
ports:
115+
- "3008:9066"
116+
ipc: host
117+
environment:
118+
no_proxy: ${no_proxy}
119+
http_proxy: ${http_proxy}
120+
https_proxy: ${https_proxy}
121+
WAV2LIP_ENDPOINT: ${WAV2LIP_ENDPOINT}
122+
restart: unless-stopped
123+
avatarchatbot-backend-server:
124+
image: ${REGISTRY:-opea}/avatarchatbot:${TAG:-latest}
125+
container_name: avatarchatbot-backend-server
126+
depends_on:
127+
- asr
128+
- llm
129+
- tts
130+
- animation
131+
ports:
132+
- "3009:8888"
133+
environment:
134+
no_proxy: ${no_proxy}
135+
https_proxy: ${https_proxy}
136+
http_proxy: ${http_proxy}
137+
MEGA_SERVICE_HOST_IP: ${MEGA_SERVICE_HOST_IP}
138+
MEGA_SERVICE_PORT: ${MEGA_SERVICE_PORT}
139+
ASR_SERVICE_HOST_IP: ${ASR_SERVICE_HOST_IP}
140+
ASR_SERVICE_PORT: ${ASR_SERVICE_PORT}
141+
LLM_SERVICE_HOST_IP: ${LLM_SERVICE_HOST_IP}
142+
LLM_SERVICE_PORT: ${LLM_SERVICE_PORT}
143+
LLM_SERVER_HOST_IP: ${LLM_SERVICE_HOST_IP}
144+
LLM_SERVER_PORT: ${LLM_SERVICE_PORT}
145+
TTS_SERVICE_HOST_IP: ${TTS_SERVICE_HOST_IP}
146+
TTS_SERVICE_PORT: ${TTS_SERVICE_PORT}
147+
ANIMATION_SERVICE_HOST_IP: ${ANIMATION_SERVICE_HOST_IP}
148+
ANIMATION_SERVICE_PORT: ${ANIMATION_SERVICE_PORT}
149+
WHISPER_SERVER_HOST_IP: ${WHISPER_SERVER_HOST_IP}
150+
WHISPER_SERVER_PORT: ${WHISPER_SERVER_PORT}
151+
SPEECHT5_SERVER_HOST_IP: ${SPEECHT5_SERVER_HOST_IP}
152+
SPEECHT5_SERVER_PORT: ${SPEECHT5_SERVER_PORT}
153+
ipc: host
154+
restart: always
155+
156+
networks:
157+
default:
158+
driver: bridge

0 commit comments

Comments
 (0)