Skip to content

Commit 082f5d5

Browse files
SpycshchyundunovDatamonsters
authored andcommitted
Refine readme of AudioQnA (opea-project#1804)
Signed-off-by: Chingis Yundunov <c.yundunov@datamonsters.com>
1 parent a48f11d commit 082f5d5

File tree

4 files changed

+264
-215
lines changed

4 files changed

+264
-215
lines changed

AudioQnA/README.md

Lines changed: 15 additions & 32 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,13 @@
22

33
AudioQnA is an example that demonstrates the integration of Generative AI (GenAI) models for performing question-answering (QnA) on audio files, with the added functionality of Text-to-Speech (TTS) for generating spoken responses. The example showcases how to convert audio input to text using Automatic Speech Recognition (ASR), generate answers to user queries using a language model, and then convert those answers back to speech using Text-to-Speech (TTS).
44

5+
## Table of Contents
6+
7+
1. [Architecture](#architecture)
8+
2. [Deployment Options](#deployment-options)
9+
10+
## Architecture
11+
512
The AudioQnA example is implemented using the component-level microservices defined in [GenAIComps](https://github.com/opea-project/GenAIComps). The flow chart below shows the information flow between different microservices for this example.
613

714
```mermaid
@@ -59,37 +66,13 @@ flowchart LR
5966
6067
```
6168

62-
## Deploy AudioQnA Service
63-
64-
The AudioQnA service can be deployed on either Intel Gaudi2 or Intel Xeon Scalable Processor.
65-
66-
### Deploy AudioQnA on Gaudi
67-
68-
Refer to the [Gaudi Guide](./docker_compose/intel/hpu/gaudi/README.md) for instructions on deploying AudioQnA on Gaudi.
69-
70-
### Deploy AudioQnA on Xeon
71-
72-
Refer to the [Xeon Guide](./docker_compose/intel/cpu/xeon/README.md) for instructions on deploying AudioQnA on Xeon.
73-
74-
## Deploy using Helm Chart
75-
76-
Refer to the [AudioQnA helm chart](./kubernetes/helm/README.md) for instructions on deploying AudioQnA on Kubernetes.
77-
78-
## Supported Models
79-
80-
### ASR
81-
82-
The default model is [openai/whisper-small](https://huggingface.co/openai/whisper-small). It also supports all models in the Whisper family, such as `openai/whisper-large-v3`, `openai/whisper-medium`, `openai/whisper-base`, `openai/whisper-tiny`, etc.
83-
84-
To replace the model, please edit the `compose.yaml` and add the `command` line to pass the name of the model you want to use:
85-
86-
```yaml
87-
services:
88-
whisper-service:
89-
...
90-
command: --model_name_or_path openai/whisper-tiny
91-
```
69+
## Deployment Options
9270

93-
### TTS
71+
The table below lists currently available deployment options. They outline in detail the implementation of this example on selected hardware.
9472

95-
The default model is [microsoft/SpeechT5](https://huggingface.co/microsoft/speecht5_tts). We currently do not support replacing the model. More models under the commercial license will be added in the future.
73+
| Category | Deployment Option | Description |
74+
| ---------------------- | ----------------- | ---------------------------------------------------------------- |
75+
| On-premise Deployments | Docker compose | [AudioQnA deployment on Xeon](./docker_compose/intel/cpu/xeon) |
76+
| | | [AudioQnA deployment on Gaudi](./docker_compose/intel/hpu/gaudi) |
77+
| | | [AudioQnA deployment on AMD ROCm](./docker_compose/amd/gpu/rocm) |
78+
| | Kubernetes | [Helm Charts](./kubernetes/helm) |

AudioQnA/README_miscellaneous.md

Lines changed: 42 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,42 @@
1+
# AudioQnA Docker Image Build
2+
3+
## Table of Contents
4+
5+
1. [Build MegaService Docker Image](#build-megaservice-docker-image)
6+
2. [Build UI Docker Image](#build-ui-docker-image)
7+
3. [Generate a HuggingFace Access Token](#generate-a-huggingface-access-token)
8+
4. [Troubleshooting](#troubleshooting)
9+
10+
## Build MegaService Docker Image
11+
12+
To construct the Megaservice of AudioQnA, the [GenAIExamples](https://github.com/opea-project/GenAIExamples.git) repository is utilized. Build Megaservice Docker image using command below:
13+
14+
```bash
15+
git clone https://github.com/opea-project/GenAIExamples.git
16+
cd GenAIExamples/AudioQnA
17+
docker build --no-cache -t opea/audioqna:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f Dockerfile .
18+
```
19+
20+
## Build UI Docker Image
21+
22+
Build frontend Docker image using below command:
23+
24+
```bash
25+
cd GenAIExamples/AudioQnA/ui
26+
docker build -t opea/audioqna-ui:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f ./docker/Dockerfile .
27+
```
28+
29+
## Generate a HuggingFace Access Token
30+
31+
Some HuggingFace resources, such as some models, are only accessible if the developer has an access token. In the absence of a HuggingFace access token, the developer can create one by first creating an account by following the steps provided at [HuggingFace](https://huggingface.co/) and then generating a [user access token](https://huggingface.co/docs/transformers.js/en/guides/private#step-1-generating-a-user-access-token).
32+
33+
## Troubleshooting
34+
35+
1. If you get errors like "Access Denied", [validate micro service](https://github.com/opea-project/GenAIExamples/tree/main/AudioQnA/docker_compose/intel/cpu/xeon/README.md#validate-microservices) first. A simple example:
36+
37+
```bash
38+
curl http://${host_ip}:7055/v1/audio/speech -XPOST -d '{"input": "Who are you?"}' -H 'Content-Type: application/json' --output speech.mp3
39+
```
40+
41+
2. (Docker only) If all microservices work well, check the port ${host_ip}:7777, the port may be allocated by other users, you can modify the `compose.yaml`.
42+
3. (Docker only) If you get errors like "The container name is in use", change container name in `compose.yaml`.
Lines changed: 99 additions & 87 deletions
Original file line numberDiff line numberDiff line change
@@ -1,123 +1,146 @@
1-
# Build Mega Service of AudioQnA on Xeon
1+
# Deploying AudioQnA on Intel® Xeon® Processors
22

3-
This document outlines the deployment process for a AudioQnA application utilizing the [GenAIComps](https://github.com/opea-project/GenAIComps.git) microservice pipeline on Intel Xeon server.
4-
5-
The default pipeline deploys with vLLM as the LLM serving component. It also provides options of using TGI backend for LLM microservice, please refer to [Start the MegaService](#-start-the-megaservice) section in this page.
3+
This document outlines the single node deployment process for a AudioQnA application utilizing the [GenAIComps](https://github.com/opea-project/GenAIComps.git) microservices on Intel Xeon server. The steps include pulling Docker images, container deployment via Docker Compose, and service execution using microservices `llm`.
64

75
Note: The default LLM is `meta-llama/Meta-Llama-3-8B-Instruct`. Before deploying the application, please make sure either you've requested and been granted the access to it on [Huggingface](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct) or you've downloaded the model locally from [ModelScope](https://www.modelscope.cn/models).
86

9-
## 🚀 Build Docker images
7+
## Table of Contents
108

11-
### 1. Source Code install GenAIComps
9+
1. [AudioQnA Quick Start Deployment](#audioqna-quick-start-deployment)
10+
2. [AudioQnA Docker Compose Files](#audioqna-docker-compose-files)
11+
3. [Validate Microservices](#validate-microservices)
12+
4. [Conclusion](#conclusion)
1213

13-
```bash
14-
git clone https://github.com/opea-project/GenAIComps.git
15-
cd GenAIComps
16-
```
14+
## AudioQnA Quick Start Deployment
1715

18-
### 2. Build ASR Image
16+
This section describes how to quickly deploy and test the AudioQnA service manually on an Intel® Xeon® processor. The basic steps are:
1917

20-
```bash
21-
docker build -t opea/whisper:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/third_parties/whisper/src/Dockerfile .
22-
```
18+
1. [Access the Code](#access-the-code)
19+
2. [Configure the Deployment Environment](#configure-the-deployment-environment)
20+
3. [Deploy the Services Using Docker Compose](#deploy-the-services-using-docker-compose)
21+
4. [Check the Deployment Status](#check-the-deployment-status)
22+
5. [Validate the Pipeline](#validate-the-pipeline)
23+
6. [Cleanup the Deployment](#cleanup-the-deployment)
2324

24-
### 3. Build vLLM Image
25+
### Access the Code
26+
27+
Clone the GenAIExample repository and access the AudioQnA Intel® Xeon® platform Docker Compose files and supporting scripts:
2528

2629
```bash
27-
git clone https://github.com/vllm-project/vllm.git
28-
cd ./vllm/
29-
VLLM_VER="v0.8.3"
30-
git checkout ${VLLM_VER}
31-
docker build --no-cache --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f docker/Dockerfile.cpu -t opea/vllm:latest --shm-size=128g .
30+
git clone https://github.com/opea-project/GenAIExamples.git
31+
cd GenAIExamples/AudioQnA
3232
```
3333

34-
### 4. Build TTS Image
34+
Then checkout a released version, such as v1.2:
3535

3636
```bash
37-
docker build -t opea/speecht5:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/third_parties/speecht5/src/Dockerfile .
38-
39-
# multilang tts (optional)
40-
docker build -t opea/gpt-sovits:latest --build-arg http_proxy=$http_proxy --build-arg https_proxy=$https_proxy -f comps/third_parties/gpt-sovits/src/Dockerfile .
37+
git checkout v1.2
4138
```
4239

43-
### 5. Build MegaService Docker Image
40+
### Configure the Deployment Environment
4441

45-
To construct the Mega Service, we utilize the [GenAIComps](https://github.com/opea-project/GenAIComps.git) microservice pipeline within the `audioqna.py` Python script. Build the MegaService Docker image using the command below:
42+
To set up environment variables for deploying AudioQnA services, set up some parameters specific to the deployment environment and source the `set_env.sh` script in this directory:
4643

4744
```bash
48-
git clone https://github.com/opea-project/GenAIExamples.git
49-
cd GenAIExamples/AudioQnA/
50-
docker build --no-cache -t opea/audioqna:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f Dockerfile .
45+
export host_ip="External_Public_IP" # ip address of the node
46+
export HUGGINGFACEHUB_API_TOKEN="Your_HuggingFace_API_Token"
47+
export http_proxy="Your_HTTP_Proxy" # http proxy if any
48+
export https_proxy="Your_HTTPs_Proxy" # https proxy if any
49+
export no_proxy=localhost,127.0.0.1,$host_ip,whisper-service,speecht5-service,vllm-service,tgi-service,audioqna-xeon-backend-server,audioqna-xeon-ui-server # additional no proxies if needed
50+
export NGINX_PORT=${your_nginx_port} # your usable port for nginx, 80 for example
51+
source ./set_env.sh
5152
```
5253

53-
Then run the command `docker images`, you will have following images ready:
54-
55-
1. `opea/whisper:latest`
56-
2. `opea/vllm:latest`
57-
3. `opea/speecht5:latest`
58-
4. `opea/audioqna:latest`
59-
5. `opea/gpt-sovits:latest` (optional)
54+
Consult the section on [AudioQnA Service configuration](#audioqna-configuration) for information on how service specific configuration parameters affect deployments.
6055

61-
## 🚀 Set the environment variables
56+
### Deploy the Services Using Docker Compose
6257

63-
Before starting the services with `docker compose`, you have to recheck the following environment variables.
58+
To deploy the AudioQnA services, execute the `docker compose up` command with the appropriate arguments. For a default deployment, execute the command below. It uses the 'compose.yaml' file.
6459

6560
```bash
66-
export host_ip=<your External Public IP> # export host_ip=$(hostname -I | awk '{print $1}')
67-
export HUGGINGFACEHUB_API_TOKEN=<your HF token>
61+
cd docker_compose/intel/cpu/xeon
62+
docker compose -f compose.yaml up -d
63+
```
6864

69-
export LLM_MODEL_ID="meta-llama/Meta-Llama-3-8B-Instruct"
65+
> **Note**: developers should build docker image from source when:
66+
>
67+
> - Developing off the git main branch (as the container's ports in the repo may be different > from the published docker image).
68+
> - Unable to download the docker image.
69+
> - Use a specific version of Docker image.
7070
71-
export MEGA_SERVICE_HOST_IP=${host_ip}
72-
export WHISPER_SERVER_HOST_IP=${host_ip}
73-
export SPEECHT5_SERVER_HOST_IP=${host_ip}
74-
export LLM_SERVER_HOST_IP=${host_ip}
75-
export GPT_SOVITS_SERVER_HOST_IP=${host_ip}
71+
Please refer to the table below to build different microservices from source:
7672

77-
export WHISPER_SERVER_PORT=7066
78-
export SPEECHT5_SERVER_PORT=7055
79-
export GPT_SOVITS_SERVER_PORT=9880
80-
export LLM_SERVER_PORT=3006
73+
| Microservice | Deployment Guide |
74+
| ------------ | --------------------------------------------------------------------------------------------------------------------------------- |
75+
| vLLM | [vLLM build guide](https://github.com/opea-project/GenAIComps/tree/main/comps/third_parties/vllm#build-docker) |
76+
| LLM | [LLM build guide](https://github.com/opea-project/GenAIComps/tree/main/comps/llms) |
77+
| WHISPER | [Whisper build guide](https://github.com/opea-project/GenAIComps/tree/main/comps/asr/src#211-whisper-server-image) |
78+
| SPEECHT5 | [SpeechT5 build guide](https://github.com/opea-project/GenAIComps/tree/main/comps/tts/src#211-speecht5-server-image) |
79+
| GPT-SOVITS | [GPT-SOVITS build guide](https://github.com/opea-project/GenAIComps/tree/main/comps/third_parties/gpt-sovits/src#build-the-image) |
80+
| MegaService | [MegaService build guide](../../../../README_miscellaneous.md#build-megaservice-docker-image) |
81+
| UI | [Basic UI build guide](../../../../README_miscellaneous.md#build-ui-docker-image) |
8182

82-
export BACKEND_SERVICE_ENDPOINT=http://${host_ip}:3008/v1/audioqna
83-
```
83+
### Check the Deployment Status
8484

85-
or use set_env.sh file to setup environment variables.
85+
After running docker compose, check if all the containers launched via docker compose have started:
8686

87-
Note:
87+
```bash
88+
docker ps -a
89+
```
8890

89-
- Please replace with host_ip with your external IP address, do not use localhost.
90-
- If you are in a proxy environment, also set the proxy-related environment variables:
91+
For the default deployment, the following 5 containers should have started:
9192

9293
```
93-
export http_proxy="Your_HTTP_Proxy"
94-
export https_proxy="Your_HTTPs_Proxy"
95-
# Example: no_proxy="localhost, 127.0.0.1, 192.168.1.1"
96-
export no_proxy="Your_No_Proxy",${host_ip},whisper-service,speecht5-service,gpt-sovits-service,tgi-service,vllm-service,audioqna-xeon-backend-server,audioqna-xeon-ui-server
94+
1c67e44c39d2 opea/audioqna-ui:latest "docker-entrypoint.s…" About a minute ago Up About a minute 0.0.0.0:5173->5173/tcp, :::5173->5173/tcp audioqna-xeon-ui-server
95+
833a42677247 opea/audioqna:latest "python audioqna.py" About a minute ago Up About a minute 0.0.0.0:3008->8888/tcp, :::3008->8888/tcp audioqna-xeon-backend-server
96+
5dc4eb9bf499 opea/speecht5:latest "python speecht5_ser…" About a minute ago Up About a minute 0.0.0.0:7055->7055/tcp, :::7055->7055/tcp speecht5-service
97+
814e6efb1166 opea/vllm:latest "python3 -m vllm.ent…" About a minute ago Up About a minute (healthy) 0.0.0.0:3006->80/tcp, :::3006->80/tcp vllm-service
98+
46f7a00f4612 opea/whisper:latest "python whisper_serv…" About a minute ago Up About a minute 0.0.0.0:7066->7066/tcp, :::7066->7066/tcp whisper-service
9799
```
98100

99-
## 🚀 Start the MegaService
101+
If any issues are encountered during deployment, refer to the [Troubleshooting](../../../../README_miscellaneous.md#troubleshooting) section.
102+
103+
### Validate the Pipeline
104+
105+
Once the AudioQnA services are running, test the pipeline using the following command:
100106

101107
```bash
102-
cd GenAIExamples/AudioQnA/docker_compose/intel/cpu/xeon/
103-
```
108+
# Test the AudioQnA megaservice by recording a .wav file, encoding the file into the base64 format, and then sending the base64 string to the megaservice endpoint.
109+
# The megaservice will return a spoken response as a base64 string. To listen to the response, decode the base64 string and save it as a .wav file.
110+
wget https://github.com/intel/intel-extension-for-transformers/raw/refs/heads/main/intel_extension_for_transformers/neural_chat/assets/audio/sample_2.wav
111+
base64_audio=$(base64 -w 0 sample_2.wav)
104112

105-
If use vLLM as the LLM serving backend:
113+
# if you are using speecht5 as the tts service, voice can be "default" or "male"
114+
# if you are using gpt-sovits for the tts service, you can set the reference audio following https://github.com/opea-project/GenAIComps/blob/main/comps/third_parties/gpt-sovits/src/README.md
106115

116+
curl http://${host_ip}:3008/v1/audioqna \
117+
-X POST \
118+
-H "Content-Type: application/json" \
119+
-d "{\"audio\": \"${base64_audio}\", \"max_tokens\": 64, \"voice\": \"default\"}" \
120+
| sed 's/^"//;s/"$//' | base64 -d > output.wav
107121
```
108-
docker compose up -d
109122

110-
# multilang tts (optional)
111-
docker compose -f compose_multilang.yaml up -d
112-
```
123+
**Note** : Access the AudioQnA UI by web browser through this URL: `http://${host_ip}:5173`. Please confirm the `5173` port is opened in the firewall. To validate each microservice used in the pipeline refer to the [Validate Microservices](#validate-microservices) section.
113124

114-
If use TGI as the LLM serving backend:
125+
### Cleanup the Deployment
115126

127+
To stop the containers associated with the deployment, execute the following command:
128+
129+
```bash
130+
docker compose -f compose.yaml down
116131
```
117-
docker compose -f compose_tgi.yaml up -d
118-
```
119132

120-
## 🚀 Test MicroServices
133+
## AudioQnA Docker Compose Files
134+
135+
In the context of deploying an AudioQnA pipeline on an Intel® Xeon® platform, we can pick and choose different large language model serving frameworks, or single English TTS/multi-language TTS component. The table below outlines the various configurations that are available as part of the application. These configurations can be used as templates and can be extended to different components available in [GenAIComps](https://github.com/opea-project/GenAIComps.git).
136+
137+
| File | Description |
138+
| -------------------------------------------------- | ----------------------------------------------------------------------------------------- |
139+
| [compose.yaml](./compose.yaml) | Default compose file using vllm as serving framework and redis as vector database |
140+
| [compose_tgi.yaml](./compose_tgi.yaml) | The LLM serving framework is TGI. All other configurations remain the same as the default |
141+
| [compose_multilang.yaml](./compose_multilang.yaml) | The TTS component is GPT-SoVITS. All other configurations remain the same as the default |
142+
143+
## Validate MicroServices
121144

122145
1. Whisper Service
123146

@@ -161,25 +184,14 @@ docker compose -f compose_tgi.yaml up -d
161184

162185
3. TTS Service
163186

164-
```
187+
```bash
165188
# speecht5 service
166189
curl http://${host_ip}:${SPEECHT5_SERVER_PORT}/v1/audio/speech -XPOST -d '{"input": "Who are you?"}' -H 'Content-Type: application/json' --output speech.mp3
167190

168191
# gpt-sovits service (optional)
169192
curl http://${host_ip}:${GPT_SOVITS_SERVER_PORT}/v1/audio/speech -XPOST -d '{"input": "Who are you?"}' -H 'Content-Type: application/json' --output speech.mp3
170193
```
171194

172-
## 🚀 Test MegaService
173-
174-
Test the AudioQnA megaservice by recording a .wav file, encoding the file into the base64 format, and then sending the
175-
base64 string to the megaservice endpoint. The megaservice will return a spoken response as a base64 string. To listen
176-
to the response, decode the base64 string and save it as a .wav file.
195+
## Conclusion
177196

178-
```bash
179-
# if you are using speecht5 as the tts service, voice can be "default" or "male"
180-
# if you are using gpt-sovits for the tts service, you can set the reference audio following https://github.com/opea-project/GenAIComps/blob/main/comps/third_parties/gpt-sovits/src/README.md
181-
curl http://${host_ip}:3008/v1/audioqna \
182-
-X POST \
183-
-d '{"audio": "UklGRigAAABXQVZFZm10IBIAAAABAAEARKwAAIhYAQACABAAAABkYXRhAgAAAAEA", "max_tokens":64, "voice":"default"}' \
184-
-H 'Content-Type: application/json' | sed 's/^"//;s/"$//' | base64 -d > output.wav
185-
```
197+
This guide should enable developers to deploy the default configuration or any of the other compose yaml files for different configurations. It also highlights the configurable parameters that can be set before deployment.

0 commit comments

Comments
 (0)