Skip to content

Commit 14f2c1d

Browse files
CodeGen/CodeTrans - Adding files to deploy an application in the K8S environment using Helm
Signed-off-by: Chingis Yundunov <c.yundunov@datamonsters.com>
1 parent e56fac1 commit 14f2c1d

23 files changed

+780
-350
lines changed

DocSum/benchmark_docsum.yaml

Lines changed: 3 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@
33

44
deploy:
55
device: gaudi
6-
version: 1.2.0
6+
version: 1.3.0
77
modelUseHostPath: /mnt/models
88
HUGGINGFACEHUB_API_TOKEN: "" # mandatory
99
node: [1]
@@ -20,14 +20,10 @@ deploy:
2020
memory_capacity: "8000Mi"
2121
replicaCount: [1]
2222

23-
teirerank:
24-
enabled: False
25-
2623
llm:
2724
engine: vllm # or tgi
2825
model_id: "meta-llama/Llama-3.2-3B-Instruct" # mandatory
29-
replicaCount:
30-
without_teirerank: [1] # When teirerank.enabled is False
26+
replicaCount: [1]
3127
resources:
3228
enabled: False
3329
cards_per_instance: 1
@@ -78,7 +74,7 @@ benchmark:
7874

7975
# workload, all of the test cases will run for benchmark
8076
bench_target: ["docsumfixed"] # specify the bench_target for benchmark
81-
dataset: "/home/sdp/upload.txt" # specify the absolute path to the dataset file
77+
dataset: "/home/sdp/pubmed_10.txt" # specify the absolute path to the dataset file
8278
summary_type: "stuff"
8379
stream: True
8480

DocSum/docker_compose/amd/gpu/rocm/README.md

Lines changed: 133 additions & 31 deletions
Large diffs are not rendered by default.

DocSum/docker_compose/amd/gpu/rocm/set_env.sh

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@
33
# Copyright (C) 2024 Advanced Micro Devices, Inc.
44
# SPDX-License-Identifier: Apache-2.0
55

6-
export HOST_IP=''
6+
export HOST_IP=${ip_address}
77
export DOCSUM_MAX_INPUT_TOKENS="2048"
88
export DOCSUM_MAX_TOTAL_TOKENS="4096"
99
export DOCSUM_LLM_MODEL_ID="Intel/neural-chat-7b-v3-3"

DocSum/docker_compose/amd/gpu/rocm/set_env_vllm.sh

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@
33
# Copyright (C) 2024 Advanced Micro Devices, Inc.
44
# SPDX-License-Identifier: Apache-2.0
55

6-
export HOST_IP=''
6+
export HOST_IP=${ip_address}
77
export DOCSUM_HUGGINGFACEHUB_API_TOKEN=${HUGGINGFACEHUB_API_TOKEN}
88
export DOCSUM_MAX_INPUT_TOKENS=2048
99
export DOCSUM_MAX_TOTAL_TOKENS=4096

DocSum/docker_compose/intel/cpu/xeon/README.md

Lines changed: 37 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -21,40 +21,34 @@ This section describes how to quickly deploy and test the DocSum service manuall
2121
6. [Test the Pipeline](#test-the-pipeline)
2222
7. [Cleanup the Deployment](#cleanup-the-deployment)
2323

24-
### Access the Code
24+
### Access the Code and Set Up Environment
2525

2626
Clone the GenAIExample repository and access the ChatQnA Intel Xeon platform Docker Compose files and supporting scripts:
2727

28-
```
28+
```bash
2929
git clone https://github.com/opea-project/GenAIExamples.git
30-
cd GenAIExamples/DocSum/docker_compose/intel/cpu/xeon/
30+
cd GenAIExamples/DocSum/docker_compose
31+
source intel/set_env.sh
3132
```
3233

33-
Checkout a released version, such as v1.2:
34+
NOTE: by default vLLM does "warmup" at start, to optimize its performance for the specified model and the underlying platform, which can take long time. For development (and e.g. autoscaling) it can be skipped with `export VLLM_SKIP_WARMUP=true`.
3435

35-
```
36-
git checkout v1.2
36+
Checkout a released version, such as v1.3:
37+
38+
```bash
39+
git checkout v1.3
3740
```
3841

3942
### Generate a HuggingFace Access Token
4043

4144
Some HuggingFace resources, such as some models, are only accessible if you have an access token. If you do not already have a HuggingFace access token, you can create one by first creating an account by following the steps provided at [HuggingFace](https://huggingface.co/) and then generating a [user access token](https://huggingface.co/docs/transformers.js/en/guides/private#step-1-generating-a-user-access-token).
4245

43-
### Configure the Deployment Environment
44-
45-
To set up environment variables for deploying DocSum services, source the _set_env.sh_ script in this directory:
46-
47-
```
48-
source ./set_env.sh
49-
```
50-
51-
The _set_env.sh_ script will prompt for required and optional environment variables used to configure the DocSum services. If a value is not entered, the script will use a default value for the same. It will also generate a _.env_ file defining the desired configuration. Consult the section on [DocSum Service configuration](#docsum-service-configuration) for information on how service specific configuration parameters affect deployments.
52-
5346
### Deploy the Services Using Docker Compose
5447

5548
To deploy the DocSum services, execute the `docker compose up` command with the appropriate arguments. For a default deployment, execute:
5649

5750
```bash
51+
cd intel/cpu/xeon/
5852
docker compose up -d
5953
```
6054

@@ -78,13 +72,13 @@ Please refer to the table below to build different microservices from source:
7872

7973
After running docker compose, check if all the containers launched via docker compose have started:
8074

81-
```
75+
```bash
8276
docker ps -a
8377
```
8478

8579
For the default deployment, the following 5 containers should have started:
8680

87-
```
81+
```bash
8882
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
8983
748f577b3c78 opea/whisper:latest "python whisper_s…" 5 minutes ago Up About a minute 0.0.0.0:7066->7066/tcp, :::7066->7066/tcp docsum-xeon-whisper-server
9084
4eq8b7034fd9 opea/docsum-gradio-ui:latest "docker-entrypoint.s…" 5 minutes ago Up About a minute 0.0.0.0:5173->5173/tcp, :::5173->5173/tcp docsum-xeon-ui-server
@@ -109,7 +103,7 @@ curl -X POST http://${host_ip}:8888/v1/docsum \
109103

110104
To stop the containers associated with the deployment, execute the following command:
111105

112-
```
106+
```bash
113107
docker compose -f compose.yaml down
114108
```
115109

@@ -156,16 +150,19 @@ curl http://${host_ip}:8888/v1/docsum \
156150
-F "messages=" \
157151
-F "files=@/path to your file (.txt, .docx, .pdf)" \
158152
-F "max_tokens=32" \
159-
-F "language=en" \
153+
-F "language=en"
160154
```
161155

156+
Note that the `-F "messages="` flag is required, even for file uploads. Multiple files can be uploaded in a single call with multiple `-F "files=@/path"` inputs.
157+
162158
### Query with audio and video
163159

164-
> Audio and Video file uploads are not supported in docsum with curl request, please use the Gradio-UI.
160+
> Audio and video can be passed as base64 strings or uploaded by providing a local file path.
165161
166162
Audio:
167163

168164
```bash
165+
# Send base64 string
169166
curl -X POST http://${host_ip}:8888/v1/docsum \
170167
-H "Content-Type: application/json" \
171168
-d '{"type": "audio", "messages": "UklGRigAAABXQVZFZm10IBIAAAABAAEARKwAAIhYAQACABAAAABkYXRhAgAAAAEA"}'
@@ -177,11 +174,21 @@ curl http://${host_ip}:8888/v1/docsum \
177174
-F "max_tokens=32" \
178175
-F "language=en" \
179176
-F "stream=True"
177+
178+
# Upload file
179+
curl http://${host_ip}:8888/v1/docsum \
180+
-H "Content-Type: multipart/form-data" \
181+
-F "type=audio" \
182+
-F "messages=" \
183+
-F "files=@/path to your file (.mp3, .wav)" \
184+
-F "max_tokens=32" \
185+
-F "language=en"
180186
```
181187

182188
Video:
183189

184190
```bash
191+
# Send base64 string
185192
curl -X POST http://${host_ip}:8888/v1/docsum \
186193
-H "Content-Type: application/json" \
187194
-d '{"type": "video", "messages": "convert your video to base64 data type"}'
@@ -193,6 +200,15 @@ curl http://${host_ip}:8888/v1/docsum \
193200
-F "max_tokens=32" \
194201
-F "language=en" \
195202
-F "stream=True"
203+
204+
# Upload file
205+
curl http://${host_ip}:8888/v1/docsum \
206+
-H "Content-Type: multipart/form-data" \
207+
-F "type=video" \
208+
-F "messages=" \
209+
-F "files=@/path to your file (.mp4)" \
210+
-F "max_tokens=32" \
211+
-F "language=en"
196212
```
197213

198214
### Query with long context

DocSum/docker_compose/intel/cpu/xeon/compose.yaml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -40,6 +40,7 @@ services:
4040
LLM_ENDPOINT: ${LLM_ENDPOINT}
4141
LLM_MODEL_ID: ${LLM_MODEL_ID}
4242
HUGGINGFACEHUB_API_TOKEN: ${HUGGINGFACEHUB_API_TOKEN}
43+
HF_TOKEN: ${HUGGINGFACEHUB_API_TOKEN}
4344
MAX_INPUT_TOKENS: ${MAX_INPUT_TOKENS}
4445
MAX_TOTAL_TOKENS: ${MAX_TOTAL_TOKENS}
4546
DocSum_COMPONENT_NAME: ${DocSum_COMPONENT_NAME}

DocSum/docker_compose/intel/cpu/xeon/compose_tgi.yaml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -40,6 +40,7 @@ services:
4040
LLM_ENDPOINT: ${LLM_ENDPOINT}
4141
LLM_MODEL_ID: ${LLM_MODEL_ID}
4242
HUGGINGFACEHUB_API_TOKEN: ${HUGGINGFACEHUB_API_TOKEN}
43+
HF_TOKEN: ${HUGGINGFACEHUB_API_TOKEN}
4344
MAX_INPUT_TOKENS: ${MAX_INPUT_TOKENS}
4445
MAX_TOTAL_TOKENS: ${MAX_TOTAL_TOKENS}
4546
DocSum_COMPONENT_NAME: ${DocSum_COMPONENT_NAME}

DocSum/docker_compose/intel/hpu/gaudi/README.md

Lines changed: 37 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -23,40 +23,34 @@ This section describes how to quickly deploy and test the DocSum service manuall
2323
6. [Test the Pipeline](#test-the-pipeline)
2424
7. [Cleanup the Deployment](#cleanup-the-deployment)
2525

26-
### Access the Code
26+
### Access the Code and Set Up Environment
2727

28-
Clone the GenAIExample repository and access the ChatQnA Intel® Gaudi® platform Docker Compose files and supporting scripts:
28+
Clone the GenAIExample repository and access the DocSum Intel® Gaudi® platform Docker Compose files and supporting scripts:
2929

30-
```
30+
```bash
3131
git clone https://github.com/opea-project/GenAIExamples.git
32-
cd GenAIExamples/DocSum/docker_compose/intel/hpu/gaudi/
32+
cd GenAIExamples/DocSum/docker_compose
33+
source intel/set_env.sh
3334
```
3435

35-
Checkout a released version, such as v1.2:
36+
NOTE: by default vLLM does "warmup" at start, to optimize its performance for the specified model and the underlying platform, which can take long time. For development (and e.g. autoscaling) it can be skipped with `export VLLM_SKIP_WARMUP=true`.
3637

37-
```
38-
git checkout v1.2
38+
Checkout a released version, such as v1.3:
39+
40+
```bash
41+
git checkout v1.3
3942
```
4043

4144
### Generate a HuggingFace Access Token
4245

4346
Some HuggingFace resources, such as some models, are only accessible if you have an access token. If you do not already have a HuggingFace access token, you can create one by first creating an account by following the steps provided at [HuggingFace](https://huggingface.co/) and then generating a [user access token](https://huggingface.co/docs/transformers.js/en/guides/private#step-1-generating-a-user-access-token).
4447

45-
### Configure the Deployment Environment
46-
47-
To set up environment variables for deploying DocSum services, source the _set_env.sh_ script in this directory:
48-
49-
```
50-
source ./set_env.sh
51-
```
52-
53-
The _set_env.sh_ script will prompt for required and optional environment variables used to configure the DocSum services. If a value is not entered, the script will use a default value for the same. It will also generate a _.env_ file defining the desired configuration. Consult the section on [DocSum Service configuration](#docsum-service-configuration) for information on how service specific configuration parameters affect deployments.
54-
5548
### Deploy the Services Using Docker Compose
5649

5750
To deploy the DocSum services, execute the `docker compose up` command with the appropriate arguments. For a default deployment, execute:
5851

5952
```bash
53+
cd intel/hpu/gaudi/
6054
docker compose up -d
6155
```
6256

@@ -80,13 +74,13 @@ Please refer to the table below to build different microservices from source:
8074

8175
After running docker compose, check if all the containers launched via docker compose have started:
8276

83-
```
77+
```bash
8478
docker ps -a
8579
```
8680

8781
For the default deployment, the following 5 containers should have started:
8882

89-
```
83+
```bash
9084
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
9185
748f577b3c78 opea/whisper:latest "python whisper_s…" 5 minutes ago Up About a minute 0.0.0.0:7066->7066/tcp, :::7066->7066/tcp docsum-gaudi-whisper-server
9286
4eq8b7034fd9 opea/docsum-gradio-ui:latest "docker-entrypoint.s…" 5 minutes ago Up About a minute 0.0.0.0:5173->5173/tcp, :::5173->5173/tcp docsum-gaudi-ui-server
@@ -111,7 +105,7 @@ curl -X POST http://${host_ip}:8888/v1/docsum \
111105

112106
To stop the containers associated with the deployment, execute the following command:
113107

114-
```
108+
```bash
115109
docker compose -f compose.yaml down
116110
```
117111

@@ -161,13 +155,16 @@ curl http://${host_ip}:8888/v1/docsum \
161155
-F "language=en" \
162156
```
163157

158+
Note that the `-F "messages="` flag is required, even for file uploads. Multiple files can be uploaded in a single call with multiple `-F "files=@/path"` inputs.
159+
164160
### Query with audio and video
165161

166-
> Audio and Video file uploads are not supported in docsum with curl request, please use the Gradio-UI.
162+
> Audio and video can be passed as base64 strings or uploaded by providing a local file path.
167163
168164
Audio:
169165

170166
```bash
167+
# Send base64 string
171168
curl -X POST http://${host_ip}:8888/v1/docsum \
172169
-H "Content-Type: application/json" \
173170
-d '{"type": "audio", "messages": "UklGRigAAABXQVZFZm10IBIAAAABAAEARKwAAIhYAQACABAAAABkYXRhAgAAAAEA"}'
@@ -179,11 +176,21 @@ curl http://${host_ip}:8888/v1/docsum \
179176
-F "max_tokens=32" \
180177
-F "language=en" \
181178
-F "stream=True"
179+
180+
# Upload file
181+
curl http://${host_ip}:8888/v1/docsum \
182+
-H "Content-Type: multipart/form-data" \
183+
-F "type=audio" \
184+
-F "messages=" \
185+
-F "files=@/path to your file (.mp3, .wav)" \
186+
-F "max_tokens=32" \
187+
-F "language=en"
182188
```
183189

184190
Video:
185191

186192
```bash
193+
# Send base64 string
187194
curl -X POST http://${host_ip}:8888/v1/docsum \
188195
-H "Content-Type: application/json" \
189196
-d '{"type": "video", "messages": "convert your video to base64 data type"}'
@@ -195,6 +202,15 @@ curl http://${host_ip}:8888/v1/docsum \
195202
-F "max_tokens=32" \
196203
-F "language=en" \
197204
-F "stream=True"
205+
206+
# Upload file
207+
curl http://${host_ip}:8888/v1/docsum \
208+
-H "Content-Type: multipart/form-data" \
209+
-F "type=video" \
210+
-F "messages=" \
211+
-F "files=@/path to your file (.mp4)" \
212+
-F "max_tokens=32" \
213+
-F "language=en"
198214
```
199215

200216
### Query with long context

DocSum/docker_compose/intel/hpu/gaudi/compose.yaml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -18,6 +18,7 @@ services:
1818
OMPI_MCA_btl_vader_single_copy_mechanism: none
1919
LLM_MODEL_ID: ${LLM_MODEL_ID}
2020
NUM_CARDS: ${NUM_CARDS}
21+
VLLM_SKIP_WARMUP: ${VLLM_SKIP_WARMUP:-false}
2122
VLLM_TORCH_PROFILER_DIR: "/mnt"
2223
healthcheck:
2324
test: ["CMD-SHELL", "curl -f http://localhost:80/health || exit 1"]
@@ -44,6 +45,7 @@ services:
4445
http_proxy: ${http_proxy}
4546
https_proxy: ${https_proxy}
4647
HUGGINGFACEHUB_API_TOKEN: ${HUGGINGFACEHUB_API_TOKEN}
48+
HF_TOKEN: ${HUGGINGFACEHUB_API_TOKEN}
4749
MAX_INPUT_TOKENS: ${MAX_INPUT_TOKENS}
4850
MAX_TOTAL_TOKENS: ${MAX_TOTAL_TOKENS}
4951
LLM_ENDPOINT: ${LLM_ENDPOINT}

DocSum/docker_compose/intel/hpu/gaudi/compose_tgi.yaml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -49,6 +49,7 @@ services:
4949
http_proxy: ${http_proxy}
5050
https_proxy: ${https_proxy}
5151
HUGGINGFACEHUB_API_TOKEN: ${HUGGINGFACEHUB_API_TOKEN}
52+
HF_TOKEN: ${HUGGINGFACEHUB_API_TOKEN}
5253
MAX_INPUT_TOKENS: ${MAX_INPUT_TOKENS}
5354
MAX_TOTAL_TOKENS: ${MAX_TOTAL_TOKENS}
5455
LLM_ENDPOINT: ${LLM_ENDPOINT}

DocSum/docker_compose/set_env.sh renamed to DocSum/docker_compose/intel/set_env.sh

Lines changed: 11 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -6,24 +6,31 @@ pushd "../../" > /dev/null
66
source .set_env.sh
77
popd > /dev/null
88

9+
export host_ip=$(hostname -I | awk '{print $1}') # Example: host_ip="192.168.1.1"
910
export no_proxy="${no_proxy},${host_ip}" # Example: no_proxy="localhost, 127.0.0.1, 192.168.1.1"
1011
export http_proxy=$http_proxy
1112
export https_proxy=$https_proxy
12-
export host_ip=$(hostname -I | awk '{print $1}') # Example: host_ip="192.168.1.1"
13-
export HUGGINGFACEHUB_API_TOKEN="Your_Huggingface_API_Token"
13+
export HUGGINGFACEHUB_API_TOKEN=${HUGGINGFACEHUB_API_TOKEN}
1414

1515
export LLM_ENDPOINT_PORT=8008
16-
export LLM_MODEL_ID="Intel/neural-chat-7b-v3-3"
16+
export LLM_MODEL_ID="meta-llama/Meta-Llama-3-8B-Instruct"
1717
export MAX_INPUT_TOKENS=1024
1818
export MAX_TOTAL_TOKENS=2048
1919

2020
export LLM_PORT=9000
2121
export LLM_ENDPOINT="http://${host_ip}:${LLM_ENDPOINT_PORT}"
2222
export DocSum_COMPONENT_NAME="OpeaDocSumvLLM" # OpeaDocSumTgi
23-
23+
export FRONTEND_SERVICE_PORT=5173
2424
export MEGA_SERVICE_HOST_IP=${host_ip}
2525
export LLM_SERVICE_HOST_IP=${host_ip}
2626
export ASR_SERVICE_HOST_IP=${host_ip}
2727

2828
export BACKEND_SERVICE_PORT=8888
2929
export BACKEND_SERVICE_ENDPOINT="http://${host_ip}:${BACKEND_SERVICE_PORT}/v1/docsum"
30+
31+
export LOGFLAG=True
32+
33+
export NUM_CARDS=1
34+
export BLOCK_SIZE=128
35+
export MAX_NUM_SEQS=256
36+
export MAX_SEQ_LEN_TO_CAPTURE=2048

0 commit comments

Comments
 (0)