You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[ChatQnA] Switch to vLLM as default llm backend on Xeon (#1403)
Switching from TGI to vLLM as the default LLM serving backend on Xeon for the ChatQnA example to enhance the perf.
#1213
Signed-off-by: Wang, Kai Lawrence <kai.lawrence.wang@intel.com>
Copy file name to clipboardExpand all lines: ChatQnA/docker_compose/intel/cpu/xeon/README.md
+25-21Lines changed: 25 additions & 21 deletions
Original file line number
Diff line number
Diff line change
@@ -1,6 +1,8 @@
1
1
# Build Mega Service of ChatQnA on Xeon
2
2
3
-
This document outlines the deployment process for a ChatQnA application utilizing the [GenAIComps](https://github.com/opea-project/GenAIComps.git) microservice pipeline on Intel Xeon server. The steps include Docker image creation, container deployment via Docker Compose, and service execution to integrate microservices such as `embedding`, `retriever`, `rerank`, and `llm`. We will publish the Docker images to Docker Hub soon, it will simplify the deployment process for this service.
3
+
This document outlines the deployment process for a ChatQnA application utilizing the [GenAIComps](https://github.com/opea-project/GenAIComps.git) microservice pipeline on Intel Xeon server. The steps include Docker image creation, container deployment via Docker Compose, and service execution to integrate microservices such as `embedding`, `retriever`, `rerank`, and `llm`.
4
+
5
+
The default pipeline deploys with vLLM as the LLM serving component and leverages rerank component. It also provides options of not using rerank in the pipeline and using TGI backend for LLM microservice, please refer to [start-all-the-services-docker-containers](#start-all-the-services-docker-containers) section in this page. Besides, refer to [Build with Pinecone VectorDB](./README_pinecone.md) and [Build with Qdrant VectorDB](./README_qdrant.md) for other deployment variants.
4
6
5
7
Quick Start:
6
8
@@ -186,14 +188,17 @@ By default, the embedding, reranking and LLM models are set to a default value a
186
188
187
189
Change the `xxx_MODEL_ID` below for your needs.
188
190
189
-
For users in China who are unable to download models directly from Huggingface, you can use [ModelScope](https://www.modelscope.cn/models) or a Huggingface mirror to download models. TGI can load the models either online or offline as described below:
191
+
For users in China who are unable to download models directly from Huggingface, you can use [ModelScope](https://www.modelscope.cn/models) or a Huggingface mirror to download models. The vLLM/TGI can load the models either online or offline as described below:
Copy file name to clipboardExpand all lines: ChatQnA/docker_compose/intel/cpu/xeon/README_pinecone.md
+11-10Lines changed: 11 additions & 10 deletions
Original file line number
Diff line number
Diff line change
@@ -1,6 +1,8 @@
1
1
# Build Mega Service of ChatQnA on Xeon
2
2
3
-
This document outlines the deployment process for a ChatQnA application utilizing the [GenAIComps](https://github.com/opea-project/GenAIComps.git) microservice pipeline on Intel Xeon server. The steps include Docker image creation, container deployment via Docker Compose, and service execution to integrate microservices such as `embedding`, `retriever`, `rerank`, and `llm`. We will publish the Docker images to Docker Hub soon, it will simplify the deployment process for this service.
3
+
This document outlines the deployment process for a ChatQnA application utilizing the [GenAIComps](https://github.com/opea-project/GenAIComps.git) microservice pipeline on Intel Xeon server. The steps include Docker image creation, container deployment via Docker Compose, and service execution to integrate microservices such as `embedding`, `retriever`, `rerank`, and `llm`.
4
+
5
+
The default pipeline deploys with vLLM as the LLM serving component and leverages rerank component.
4
6
5
7
Quick Start:
6
8
@@ -189,15 +191,15 @@ By default, the embedding, reranking and LLM models are set to a default value a
189
191
190
192
Change the `xxx_MODEL_ID` below for your needs.
191
193
192
-
For users in China who are unable to download models directly from Huggingface, you can use [ModelScope](https://www.modelscope.cn/models) or a Huggingface mirror to download models. TGI can load the models either online or offline as described below:
194
+
For users in China who are unable to download models directly from Huggingface, you can use [ModelScope](https://www.modelscope.cn/models) or a Huggingface mirror to download models. The vLLM can load the models either online or offline as described below:
Copy file name to clipboardExpand all lines: ChatQnA/docker_compose/intel/cpu/xeon/README_qdrant.md
+12-10Lines changed: 12 additions & 10 deletions
Original file line number
Diff line number
Diff line change
@@ -1,6 +1,8 @@
1
1
# Build Mega Service of ChatQnA (with Qdrant) on Xeon
2
2
3
-
This document outlines the deployment process for a ChatQnA application utilizing the [GenAIComps](https://github.com/opea-project/GenAIComps.git) microservice pipeline on Intel Xeon server. The steps include Docker image creation, container deployment via Docker Compose, and service execution to integrate microservices such as `embedding`, `retriever`, `rerank`, and `llm`. We will publish the Docker images to Docker Hub soon, it will simplify the deployment process for this service.
3
+
This document outlines the deployment process for a ChatQnA application utilizing the [GenAIComps](https://github.com/opea-project/GenAIComps.git) microservice pipeline on Intel Xeon server. The steps include Docker image creation, container deployment via Docker Compose, and service execution to integrate microservices such as `embedding`, `retriever`, `rerank`, and `llm`.
4
+
5
+
The default pipeline deploys with vLLM as the LLM serving component and leverages rerank component.
0 commit comments