Skip to content

Commit c795ef2

Browse files
authored
Add helm deployment instructions for GenAIExamples (#1373)
Add helm deployment instructions for ChatQnA, AgentQnA, AudioQnA, CodeTrans, DocSum, FaqGen and VisualQnA Signed-off-by: Dolpher Du <dolpher.du@intel.com>
1 parent 99120f4 commit c795ef2

File tree

104 files changed

+828
-14982
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

104 files changed

+828
-14982
lines changed

AgentQnA/README.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -186,6 +186,10 @@ docker build -t opea/agent:latest --build-arg https_proxy=$https_proxy --build-a
186186
:::
187187
::::
188188

189+
## Deploy using Helm Chart
190+
191+
Refer to the [AgentQnA helm chart](./kubernetes/helm/README.md) for instructions on deploying AgentQnA on Kubernetes.
192+
189193
## Validate services
190194

191195
First look at logs of the agent docker containers:

AgentQnA/kubernetes/helm/README.md

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
# Deploy AgentQnA on Kubernetes cluster
2+
3+
- You should have Helm (version >= 3.15) installed. Refer to the [Helm Installation Guide](https://helm.sh/docs/intro/install/) for more information.
4+
- For more deploy options, refer to [helm charts README](https://github.com/opea-project/GenAIInfra/tree/main/helm-charts#readme).
5+
6+
## Deploy on Gaudi
7+
8+
```
9+
export HFTOKEN="insert-your-huggingface-token-here"
10+
helm install agentqna oci://ghcr.io/opea-project/charts/agentqna --set global.HUGGINGFACEHUB_API_TOKEN=${HFTOKEN} -f gaudi-values.yaml
11+
```
Lines changed: 38 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,38 @@
1+
# Copyright (C) 2024 Intel Corporation
2+
# SPDX-License-Identifier: Apache-2.0
3+
4+
# Accelerate inferencing in heaviest components to improve performance
5+
# by overriding their subchart values
6+
7+
tgi:
8+
enabled: true
9+
accelDevice: "gaudi"
10+
image:
11+
repository: ghcr.io/huggingface/tgi-gaudi
12+
tag: "2.0.6"
13+
resources:
14+
limits:
15+
habana.ai/gaudi: 4
16+
MAX_INPUT_LENGTH: "4096"
17+
MAX_TOTAL_TOKENS: "8192"
18+
CUDA_GRAPHS: ""
19+
OMPI_MCA_btl_vader_single_copy_mechanism: "none"
20+
PT_HPU_ENABLE_LAZY_COLLECTIVES: "true"
21+
ENABLE_HPU_GRAPH: "true"
22+
LIMIT_HPU_GRAPH: "true"
23+
USE_FLASH_ATTENTION: "true"
24+
FLASH_ATTENTION_RECOMPUTE: "true"
25+
extraCmdArgs: ["--sharded","true","--num-shard","4"]
26+
livenessProbe:
27+
initialDelaySeconds: 5
28+
periodSeconds: 5
29+
timeoutSeconds: 1
30+
readinessProbe:
31+
initialDelaySeconds: 5
32+
periodSeconds: 5
33+
timeoutSeconds: 1
34+
startupProbe:
35+
initialDelaySeconds: 5
36+
periodSeconds: 5
37+
timeoutSeconds: 1
38+
failureThreshold: 120

AudioQnA/README.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -71,6 +71,10 @@ Refer to the [Gaudi Guide](./docker_compose/intel/hpu/gaudi/README.md) for instr
7171

7272
Refer to the [Xeon Guide](./docker_compose/intel/cpu/xeon/README.md) for instructions on deploying AudioQnA on Xeon.
7373

74+
## Deploy using Helm Chart
75+
76+
Refer to the [AudioQnA helm chart](./kubernetes/helm/README.md) for instructions on deploying AudioQnA on Kubernetes.
77+
7478
## Supported Models
7579

7680
### ASR

AudioQnA/kubernetes/helm/README.md

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,18 @@
1+
# Deploy AudioQnA on Kubernetes cluster
2+
3+
- You should have Helm (version >= 3.15) installed. Refer to the [Helm Installation Guide](https://helm.sh/docs/intro/install/) for more information.
4+
- For more deploy options, refer to [helm charts README](https://github.com/opea-project/GenAIInfra/tree/main/helm-charts#readme).
5+
6+
## Deploy on Xeon
7+
8+
```
9+
export HFTOKEN="insert-your-huggingface-token-here"
10+
helm install audioqna oci://ghcr.io/opea-project/charts/audioqna --set global.HUGGINGFACEHUB_API_TOKEN=${HFTOKEN} -f cpu-values.yaml
11+
```
12+
13+
## Deploy on Gaudi
14+
15+
```
16+
export HFTOKEN="insert-your-huggingface-token-here"
17+
helm install audioqna oci://ghcr.io/opea-project/charts/audioqna --set global.HUGGINGFACEHUB_API_TOKEN=${HFTOKEN} -f gaudi-values.yaml
18+
```
Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
# Copyright (C) 2024 Intel Corporation
2+
# SPDX-License-Identifier: Apache-2.0
3+
4+
tgi:
5+
LLM_MODEL_ID: Intel/neural-chat-7b-v3-3
Lines changed: 43 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,43 @@
1+
# Copyright (C) 2024 Intel Corporation
2+
# SPDX-License-Identifier: Apache-2.0
3+
4+
tgi:
5+
accelDevice: "gaudi"
6+
image:
7+
repository: ghcr.io/huggingface/tgi-gaudi
8+
tag: "2.0.6"
9+
resources:
10+
limits:
11+
habana.ai/gaudi: 1
12+
MAX_INPUT_LENGTH: "1024"
13+
MAX_TOTAL_TOKENS: "2048"
14+
CUDA_GRAPHS: ""
15+
HF_HUB_DISABLE_PROGRESS_BARS: 1
16+
HF_HUB_ENABLE_HF_TRANSFER: 0
17+
ENABLE_HPU_GRAPH: true
18+
LIMIT_HPU_GRAPH: true
19+
USE_FLASH_ATTENTION: true
20+
FLASH_ATTENTION_RECOMPUTE: true
21+
livenessProbe:
22+
initialDelaySeconds: 5
23+
periodSeconds: 5
24+
timeoutSeconds: 1
25+
readinessProbe:
26+
initialDelaySeconds: 5
27+
periodSeconds: 5
28+
timeoutSeconds: 1
29+
startupProbe:
30+
initialDelaySeconds: 5
31+
periodSeconds: 5
32+
timeoutSeconds: 1
33+
failureThreshold: 120
34+
35+
whisper:
36+
resources:
37+
limits:
38+
habana.ai/gaudi: 1
39+
40+
speecht5:
41+
resources:
42+
limits:
43+
habana.ai/gaudi: 1

AudioQnA/kubernetes/intel/README.md

Lines changed: 0 additions & 32 deletions
This file was deleted.

0 commit comments

Comments
 (0)