You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[ChatQnA] Switch to vLLM as default llm backend on Gaudi (#1404)
Switching from TGI to vLLM as the default LLM serving backend on Gaudi for the ChatQnA example to enhance the perf.
#1213
Signed-off-by: Wang, Kai Lawrence <kai.lawrence.wang@intel.com>
Copy file name to clipboardExpand all lines: ChatQnA/docker_compose/intel/hpu/gaudi/README.md
+31-18Lines changed: 31 additions & 18 deletions
Original file line number
Diff line number
Diff line change
@@ -1,6 +1,8 @@
1
1
# Build MegaService of ChatQnA on Gaudi
2
2
3
-
This document outlines the deployment process for a ChatQnA application utilizing the [GenAIComps](https://github.com/opea-project/GenAIComps.git) microservice pipeline on Intel Gaudi server. The steps include Docker image creation, container deployment via Docker Compose, and service execution to integrate microservices such as embedding, retriever, rerank, and llm. We will publish the Docker images to Docker Hub, it will simplify the deployment process for this service.
3
+
This document outlines the deployment process for a ChatQnA application utilizing the [GenAIComps](https://github.com/opea-project/GenAIComps.git) microservice pipeline on Intel Gaudi server. The steps include Docker image creation, container deployment via Docker Compose, and service execution to integrate microservices such as `embedding`, `retriever`, `rerank`, and `llm`.
4
+
5
+
The default pipeline deploys with vLLM as the LLM serving component and leverages rerank component. It also provides options of not using rerank in the pipeline, leveraging guardrails, or using TGI backend for LLM microservice, please refer to [start-all-the-services-docker-containers](#start-all-the-services-docker-containers) section in this page.
4
6
5
7
Quick Start:
6
8
@@ -184,15 +186,18 @@ By default, the embedding, reranking and LLM models are set to a default value a
184
186
185
187
Change the `xxx_MODEL_ID` below for your needs.
186
188
187
-
For users in China who are unable to download models directly from Huggingface, you can use [ModelScope](https://www.modelscope.cn/models) or a Huggingface mirror to download models. TGI can load the models either online or offline as described below:
189
+
For users in China who are unable to download models directly from Huggingface, you can use [ModelScope](https://www.modelscope.cn/models) or a Huggingface mirror to download models. The vLLM/TGI can load the models either online or offline as described below:
0 commit comments