Skip to content

AgentQnA - Adding files to deploy an application in the K8S environment using Helm #1793

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 37 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
37 commits
Select commit Hold shift + click to select a range
cf60682
DocSum - add files for deploy app with ROCm vLLM
Feb 13, 2025
1fd1de1
DocSum - fix main
Feb 13, 2025
bd2d47e
DocSum - add files for deploy app with ROCm vLLM
Feb 13, 2025
2459ecb
DocSum - fix main
Feb 13, 2025
4d35065
Merge remote-tracking branch 'origin/main'
Feb 19, 2025
6d5049d
DocSum - add files for deploy app with ROCm vLLM
Feb 13, 2025
9dfbdc5
DocSum - fix main
Feb 13, 2025
a8857ae
DocSum - add files for deploy app with ROCm vLLM
Feb 13, 2025
5a38b26
DocSum - fix main
Feb 13, 2025
0e2ef94
Merge remote-tracking branch 'origin/main'
Feb 25, 2025
30071db
Merge branch 'main' of https://github.com/opea-project/GenAIExamples
Mar 11, 2025
0757dec
Merge branch 'opea-project:main' into main
artem-astafev Mar 20, 2025
9aaf378
Merge branch 'main' of https://github.com/opea-project/GenAIExamples
Mar 26, 2025
9cf4b6e
Merge branch 'main' of https://github.com/opea-project/GenAIExamples
Apr 3, 2025
8e89787
Merge branch 'main' of https://github.com/opea-project/GenAIExamples
Apr 5, 2025
a117c69
Merge branch 'main' of https://github.com/opea-project/GenAIExamples
Apr 11, 2025
198d50e
AgentQnA - Adding files to deploy an application in the K8S environme…
Apr 11, 2025
9bc4a37
AgentQnA - Adding files to deploy an application in the K8S environme…
chyundunovDatamonsters Apr 22, 2025
015faa6
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Apr 22, 2025
78e9095
Merge branch 'main' into feature/AgentQnA_k8s
chyundunovDatamonsters Apr 22, 2025
02f4986
Merge branch 'main' of https://github.com/opea-project/GenAIExamples …
chyundunovDatamonsters Apr 24, 2025
62c1c5f
AgentQnA - Adding files to deploy an application in the K8S environme…
chyundunovDatamonsters Apr 24, 2025
834cf04
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Apr 24, 2025
2dba72a
AgentQnA - Adding files to deploy an application in the K8S environme…
chyundunovDatamonsters Apr 24, 2025
e68e869
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Apr 24, 2025
42e2b0c
AgentQnA - Adding files to deploy an application in the K8S environme…
chyundunovDatamonsters Apr 24, 2025
944e9fd
Merge remote-tracking branch 'origin/feature/AgentQnA_k8s' into featu…
chyundunovDatamonsters Apr 24, 2025
9e42005
AgentQnA - Adding files to deploy an application in the K8S environme…
chyundunovDatamonsters Apr 24, 2025
de2fbb5
AgentQnA - Adding files to deploy an application in the K8S environme…
chyundunovDatamonsters Apr 24, 2025
5479cce
AgentQnA - Adding files to deploy an application in the K8S environme…
chyundunovDatamonsters Apr 24, 2025
f55d7f9
AgentQnA - Adding files to deploy an application in the K8S environme…
chyundunovDatamonsters Apr 24, 2025
3765a62
AgentQnA - Adding files to deploy an application in the K8S environme…
chyundunovDatamonsters Apr 24, 2025
591115d
Merge branch 'main' into feature/AgentQnA_k8s
chyundunovDatamonsters Apr 25, 2025
7063ba0
Merge branch 'main' into feature/AgentQnA_k8s
mkbhanda Apr 25, 2025
4cadccd
Merge branch 'main' into feature/AgentQnA_k8s
xiguiw May 16, 2025
5e55c81
AgentQnA - Adding files to deploy an application in the K8S environme…
chyundunovDatamonsters May 27, 2025
a3403c7
Merge branch 'main' into feature/AgentQnA_k8s
chensuyue May 30, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 14 additions & 0 deletions AgentQnA/kubernetes/helm/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,3 +9,17 @@
export HFTOKEN="insert-your-huggingface-token-here"
helm install agentqna oci://ghcr.io/opea-project/charts/agentqna --set global.HUGGINGFACEHUB_API_TOKEN=${HFTOKEN} -f gaudi-values.yaml
```

## Deploy on ROCm with vLLM

```
export HFTOKEN="insert-your-huggingface-token-here"
helm upgrade --install agentqna oci://ghcr.io/opea-project/charts/agentqna --set global.HUGGINGFACEHUB_API_TOKEN=${HFTOKEN} -f rocm-values.yaml
```

## Deploy on ROCm with TGI

```
export HFTOKEN="insert-your-huggingface-token-here"
helm upgrade --install agentqna oci://ghcr.io/opea-project/charts/agentqna --set global.HUGGINGFACEHUB_API_TOKEN=${HFTOKEN} -f rocm-tgi-values.yaml
```
55 changes: 55 additions & 0 deletions AgentQnA/kubernetes/helm/rocm-tgi-values.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
# Copyright (C) 2025 Advanced Micro Devices, Inc.

# Accelerate inferencing in heaviest components to improve performance
# by overriding their subchart values
vllm:
enabled: false
tgi:
enabled: true
accelDevice: "rocm"
image:
repository: ghcr.io/huggingface/text-generation-inference
tag: "2.4.1-rocm"
LLM_MODEL_ID: "meta-llama/Meta-Llama-3-8B-Instruct"
MAX_INPUT_LENGTH: "2048"
MAX_TOTAL_TOKENS: "4096"
USE_FLASH_ATTENTION: "false"
FLASH_ATTENTION_RECOMPUTE: "false"
HIP_VISIBLE_DEVICES: "0"
MAX_BATCH_SIZE: "4"
extraCmdArgs: [ "--num-shard","1" ]
resources:
limits:
amd.com/gpu: "1"
requests:
cpu: 1
memory: 16Gi
securityContext:
readOnlyRootFilesystem: false
runAsNonRoot: false
runAsUser: 0
capabilities:
add:
- SYS_PTRACE
readinessProbe:
initialDelaySeconds: 60
periodSeconds: 5
timeoutSeconds: 1
failureThreshold: 120
startupProbe:
initialDelaySeconds: 60
periodSeconds: 5
timeoutSeconds: 1
failureThreshold: 120
supervisor:
llm_endpoint_url: http://{{ .Release.Name }}-tgi
llm_engine: tgi
model: "meta-llama/Meta-Llama-3-8B-Instruct"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ragagent:
llm_endpoint_url: http://{{ .Release.Name }}-tgi
llm_engine: tgi
model: "meta-llama/Meta-Llama-3-8B-Instruct"
sqlagent:
llm_endpoint_url: http://{{ .Release.Name }}-tgi
llm_engine: tgi
model: "meta-llama/Meta-Llama-3-8B-Instruct"
51 changes: 51 additions & 0 deletions AgentQnA/kubernetes/helm/rocm-values.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
# Copyright (C) 2025 Advanced Micro Devices, Inc.

# Accelerate inferencing in heaviest components to improve performance
# by overriding their subchart values

tgi:
enabled: false
vllm:
enabled: true
accelDevice: "rocm"
image:
repository: opea/vllm-rocm
tag: latest
LLM_MODEL_ID: "meta-llama/Meta-Llama-3-8B-Instruct"
env:
HIP_VISIBLE_DEVICES: "0"
TENSOR_PARALLEL_SIZE: "1"
HF_HUB_DISABLE_PROGRESS_BARS: "1"
HF_HUB_ENABLE_HF_TRANSFER: "0"
VLLM_USE_TRITON_FLASH_ATTN: "0"
VLLM_WORKER_MULTIPROC_METHOD: "spawn"
PYTORCH_JIT: "0"
HF_HOME: "/data"
extraCmd:
command: [ "python3", "/workspace/api_server.py" ]
extraCmdArgs: [ "--swap-space", "16",
"--disable-log-requests",
"--dtype", "float16",
"--num-scheduler-steps", "1",
"--distributed-executor-backend", "mp" ]
resources:
limits:
amd.com/gpu: "1"
startupProbe:
failureThreshold: 180
securityContext:
readOnlyRootFilesystem: false
runAsNonRoot: false
runAsUser: 0
supervisor:
llm_endpoint_url: http://{{ .Release.Name }}-vllm
llm_engine: vllm
model: "meta-llama/Meta-Llama-3-8B-Instruct"
ragagent:
llm_endpoint_url: http://{{ .Release.Name }}-vllm
llm_engine: vllm
model: "meta-llama/Meta-Llama-3-8B-Instruct"
sqlagent:
llm_endpoint_url: http://{{ .Release.Name }}-vllm
llm_engine: vllm
model: "meta-llama/Meta-Llama-3-8B-Instruct"
Loading