Skip to content

Commit 3fb6060

Browse files
Use official tei gaudi image and update tgi gaudi version (opea-project#810)
Signed-off-by: lvliang-intel <liang1.lv@intel.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
1 parent c35fe0b commit 3fb6060

File tree

72 files changed

+8024
-154
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

72 files changed

+8024
-154
lines changed

.github/workflows/_example-workflow.yml

Lines changed: 0 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -64,10 +64,6 @@ jobs:
6464
run: |
6565
cd ${{ github.workspace }}/${{ inputs.example }}/docker_image_build
6666
docker_compose_path=${{ github.workspace }}/${{ inputs.example }}/docker_image_build/build.yaml
67-
if [[ $(grep -c "tei-gaudi:" ${docker_compose_path}) != 0 ]]; then
68-
git clone https://github.com/huggingface/tei-gaudi.git
69-
cd tei-gaudi && git rev-parse HEAD && cd ../
70-
fi
7167
if [[ $(grep -c "vllm:" ${docker_compose_path}) != 0 ]]; then
7268
git clone https://github.com/vllm-project/vllm.git
7369
cd vllm && git rev-parse HEAD && cd ../

AgentQnA/docker_compose/intel/hpu/gaudi/compose.yaml

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@
33

44
services:
55
tgi-server:
6-
image: ghcr.io/huggingface/tgi-gaudi:2.0.4
6+
image: ghcr.io/huggingface/tgi-gaudi:2.0.5
77
container_name: tgi-server
88
ports:
99
- "8085:80"
@@ -13,12 +13,16 @@ services:
1313
no_proxy: ${no_proxy}
1414
http_proxy: ${http_proxy}
1515
https_proxy: ${https_proxy}
16-
HF_TOKEN: ${HUGGINGFACEHUB_API_TOKEN}
16+
HUGGING_FACE_HUB_TOKEN: ${HUGGINGFACEHUB_API_TOKEN}
1717
HF_HUB_DISABLE_PROGRESS_BARS: 1
1818
HF_HUB_ENABLE_HF_TRANSFER: 0
1919
HABANA_VISIBLE_DEVICES: all
2020
OMPI_MCA_btl_vader_single_copy_mechanism: none
2121
PT_HPU_ENABLE_LAZY_COLLECTIVES: true
22+
ENABLE_HPU_GRAPH: true
23+
LIMIT_HPU_GRAPH: true
24+
USE_FLASH_ATTENTION: true
25+
FLASH_ATTENTION_RECOMPUTE: true
2226
runtime: habana
2327
cap_add:
2428
- SYS_NICE

AudioQnA/docker_compose/intel/hpu/gaudi/compose.yaml

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -51,7 +51,7 @@ services:
5151
environment:
5252
TTS_ENDPOINT: ${TTS_ENDPOINT}
5353
tgi-service:
54-
image: ghcr.io/huggingface/tgi-gaudi:2.0.1
54+
image: ghcr.io/huggingface/tgi-gaudi:2.0.5
5555
container_name: tgi-gaudi-server
5656
ports:
5757
- "3006:80"
@@ -61,11 +61,15 @@ services:
6161
no_proxy: ${no_proxy}
6262
http_proxy: ${http_proxy}
6363
https_proxy: ${https_proxy}
64-
HF_TOKEN: ${HUGGINGFACEHUB_API_TOKEN}
64+
HUGGING_FACE_HUB_TOKEN: ${HUGGINGFACEHUB_API_TOKEN}
6565
HF_HUB_DISABLE_PROGRESS_BARS: 1
6666
HF_HUB_ENABLE_HF_TRANSFER: 0
6767
HABANA_VISIBLE_DEVICES: all
6868
OMPI_MCA_btl_vader_single_copy_mechanism: none
69+
ENABLE_HPU_GRAPH: true
70+
LIMIT_HPU_GRAPH: true
71+
USE_FLASH_ATTENTION: true
72+
FLASH_ATTENTION_RECOMPUTE: true
6973
runtime: habana
7074
cap_add:
7175
- SYS_NICE

AudioQnA/kubernetes/intel/README_gmc.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -25,7 +25,7 @@ The AudioQnA uses the below prebuilt images if you choose a Xeon deployment
2525
Should you desire to use the Gaudi accelerator, two alternate images are used for the embedding and llm services.
2626
For Gaudi:
2727

28-
- tgi-service: ghcr.io/huggingface/tgi-gaudi:1.2.1
28+
- tgi-service: ghcr.io/huggingface/tgi-gaudi:2.0.5
2929
- whisper-gaudi: opea/whisper-gaudi:latest
3030
- speecht5-gaudi: opea/speecht5-gaudi:latest
3131

AudioQnA/kubernetes/intel/hpu/gaudi/manifest/audioqna.yaml

Lines changed: 10 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -271,7 +271,7 @@ spec:
271271
- envFrom:
272272
- configMapRef:
273273
name: audio-qna-config
274-
image: ghcr.io/huggingface/tgi-gaudi:2.0.1
274+
image: ghcr.io/huggingface/tgi-gaudi:2.0.5
275275
name: llm-dependency-deploy-demo
276276
securityContext:
277277
capabilities:
@@ -303,6 +303,14 @@ spec:
303303
value: none
304304
- name: PT_HPU_ENABLE_LAZY_COLLECTIVES
305305
value: 'true'
306+
- name: ENABLE_HPU_GRAPH
307+
value: 'true'
308+
- name: LIMIT_HPU_GRAPH
309+
value: 'true'
310+
- name: USE_FLASH_ATTENTION
311+
value: 'true'
312+
- name: FLASH_ATTENTION_RECOMPUTE
313+
value: 'true'
306314
- name: runtime
307315
value: habana
308316
- name: HABANA_VISIBLE_DEVICES
@@ -315,7 +323,7 @@ spec:
315323
volumes:
316324
- name: model-volume
317325
hostPath:
318-
path: /home/sdp/cesg
326+
path: /mnt/models
319327
type: Directory
320328
- name: shm
321329
emptyDir:

AudioQnA/tests/test_compose_on_gaudi.sh

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -22,7 +22,7 @@ function build_docker_images() {
2222
service_list="audioqna whisper-gaudi asr llm-tgi speecht5-gaudi tts"
2323
docker compose -f build.yaml build ${service_list} --no-cache > ${LOG_PATH}/docker_image_build.log
2424

25-
docker pull ghcr.io/huggingface/tgi-gaudi:2.0.1
25+
docker pull ghcr.io/huggingface/tgi-gaudi:2.0.5
2626
docker images && sleep 1s
2727
}
2828

AudioQnA/tests/test_compose_on_xeon.sh

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -22,7 +22,7 @@ function build_docker_images() {
2222
service_list="audioqna whisper asr llm-tgi speecht5 tts"
2323
docker compose -f build.yaml build ${service_list} --no-cache > ${LOG_PATH}/docker_image_build.log
2424

25-
docker pull ghcr.io/huggingface/tgi-gaudi:2.0.1
25+
docker pull ghcr.io/huggingface/tgi-gaudi:2.0.5
2626
docker images && sleep 1s
2727
}
2828

0 commit comments

Comments
 (0)