You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Combine all ChatQnA related docker images into one
Remove Dockerfile.faqgen, Dockerfile.without_rerank, Dockerfile.guardrails
Combine all types into Dockerfile, use env CHATQNA_TYPE to make selection
Signed-off-by: Xinyao Wang <xinyao.wang@intel.com>
To construct the Mega Service, we utilize the [GenAIComps](https://github.com/opea-project/GenAIComps.git) microservice pipeline within the `chatqna.py` Python script. Build the MegaService Docker image using the command below:
118
+
To construct the Mega Service, we utilize the [GenAIComps](https://github.com/opea-project/GenAIComps.git) microservice pipeline within the `chatqna.py` Python script. Build the MegaService Docker image using the command below:
To construct the Mega Service with Rerank, we utilize the [GenAIComps](https://github.com/opea-project/GenAIComps.git) microservice pipeline within the `chatqna.py` Python script. Build MegaService Docker image via below command:
157
157
158
-
To construct the Mega Service with Rerank, we utilize the [GenAIComps](https://github.com/opea-project/GenAIComps.git) microservice pipeline within the `chatqna.py` Python script. Build MegaService Docker image via below command:
To construct the Mega Service without Rerank, we utilize the [GenAIComps](https://github.com/opea-project/GenAIComps.git) microservice pipeline within the `chatqna_without_rerank.py` Python script. Build MegaService Docker image via below command:
The FAQs(frequently asked questions and answers) generation Deployment will generate FAQs instead of normally text generation. It add a new microservice called `llm-faqgen`, which is a microservice that interacts with the TGI/vLLM LLM server to generate FAQs from input text. Chatqna backend image change from `opea/chatqna:latest` to `opea/chatqna-faqgen:latest`, which depends on `llm-faqgen`.
179
+
The FAQs(frequently asked questions and answers) generation Deployment will generate FAQs instead of normally text generation. It add a new microservice called `llm-faqgen`, which is a microservice that interacts with the TGI/vLLM LLM server to generate FAQs from input text.
180
180
181
181
The TGI (Text Generation Inference) deployment and the default deployment differ primarily in their service configurations and specific focus on handling large language models (LLMs). The TGI deployment includes a unique `tgi-service`, which utilizes the `ghcr.io/huggingface/tgi-gaudi:2.0.6` image and is specifically configured to run on Gaudi hardware. This service is designed to handle LLM tasks with optimizations such as `ENABLE_HPU_GRAPH` and `USE_FLASH_ATTENTION`. The `chatqna-gaudi-backend-server` in the TGI deployment depends on the `tgi-service`, whereas in the default deployment, it relies on the `vllm-service`.
182
182
@@ -188,16 +188,16 @@ The TGI (Text Generation Inference) deployment and the default deployment differ
| chatqna-gaudi-backend-server | opea/chatqna-faqgen:latest| No |
191
+
|**llm-faqgen**|**opea/llm-faqgen:latest**| No |
192
+
| chatqna-gaudi-backend-server | opea/chatqna:latest| No |
193
193
| chatqna-gaudi-ui-server | opea/chatqna-ui:latest| No |
194
194
| chatqna-gaudi-nginx-server | opea/nginx:latest| No |
195
195
196
196
We also provided a TGI based deployment for FAQ generation `compose_faqgen_tgi.yaml`, which only replace `vllm-service` with `tgi-service`.
197
197
198
198
### compose_without_rerank.yaml - No ReRank Deployment
199
199
200
-
The _compose_without_rerank.yaml_ Docker Compose file is distinct from the default deployment primarily due to the exclusion of the reranking service. In this version, the `tei-reranking-service`, which is typically responsible for providing reranking capabilities for text embeddings and is configured to run on Gaudi hardware, is absent. This omission simplifies the service architecture by removing a layer of processing that would otherwise enhance the ranking of text embeddings. Consequently, the `chatqna-gaudi-backend-server` in this deployment uses a specialized image, `opea/chatqna-without-rerank:latest`, indicating that it is tailored to function without the reranking feature. As a result, the backend server's dependencies are adjusted, without the need for the reranking service. This streamlined setup may impact the application's functionality and performance by focusing on core operations without the additional processing layer provided by reranking, potentially making it more efficient for scenarios where reranking is not essential and freeing Intel® Gaudi® accelerators for other tasks.
200
+
The _compose_without_rerank.yaml_ Docker Compose file is distinct from the default deployment primarily due to the exclusion of the reranking service. In this version, the `tei-reranking-service`, which is typically responsible for providing reranking capabilities for text embeddings and is configured to run on Gaudi hardware, is absent. This omission simplifies the service architecture by removing a layer of processing that would otherwise enhance the ranking of text embeddings. As a result, the backend server's dependencies are adjusted, without the need for the reranking service. This streamlined setup may impact the application's functionality and performance by focusing on core operations without the additional processing layer provided by reranking, potentially making it more efficient for scenarios where reranking is not essential and freeing Intel® Gaudi® accelerators for other tasks.
| chatqna-gaudi-backend-server |**opea/chatqna-without-rerank:latest**| No |
209
+
| chatqna-gaudi-backend-server | opea/chatqna:latest| No |
210
210
| chatqna-gaudi-ui-server | opea/chatqna-ui:latest| No |
211
211
| chatqna-gaudi-nginx-server | opea/nginx:latest| No |
212
212
213
213
This setup might allow for more Gaudi devices to be dedicated to the `vllm-service`, enhancing LLM processing capabilities and accommodating larger models. However, it also means that the benefits of reranking are sacrificed, which could impact the overall quality of the pipeline's output.
The _compose_guardrails.yaml_ Docker Compose file introduces enhancements over the default deployment by incorporating additional services focused on safety and ChatQnA response control. Notably, it includes the `tgi-guardrails-service` and `guardrails` services. The `tgi-guardrails-service` uses the `ghcr.io/huggingface/tgi-gaudi:2.0.6` image and is configured to run on Gaudi hardware, providing functionality to manage input constraints and ensure safe operations within defined limits. The guardrails service, using the `opea/guardrails:latest` image, acts as a safety layer that interfaces with the `tgi-guardrails-service` to enforce safety protocols and manage interactions with the large language model (LLM). Additionally, the `chatqna-gaudi-backend-server` is updated to use the `opea/chatqna-guardrails:latest` image, indicating its design to integrate with these new guardrail services. This backend server now depends on the `tgi-guardrails-service` and `guardrails`, alongside existing dependencies like `redis-vector-db`, `tei-embedding-service`, `retriever`, `tei-reranking-service`, and `vllm-service`. The environment configurations for the backend are also updated to include settings for the guardrail services.
217
+
The _compose_guardrails.yaml_ Docker Compose file introduces enhancements over the default deployment by incorporating additional services focused on safety and ChatQnA response control. Notably, it includes the `tgi-guardrails-service` and `guardrails` services. The `tgi-guardrails-service` uses the `ghcr.io/huggingface/tgi-gaudi:2.0.6` image and is configured to run on Gaudi hardware, providing functionality to manage input constraints and ensure safe operations within defined limits. The guardrails service, using the `opea/guardrails:latest` image, acts as a safety layer that interfaces with the `tgi-guardrails-service` to enforce safety protocols and manage interactions with the large language model (LLM). This backend server now depends on the `tgi-guardrails-service` and `guardrails`, alongside existing dependencies like `redis-vector-db`, `tei-embedding-service`, `retriever`, `tei-reranking-service`, and `vllm-service`. The environment configurations for the backend are also updated to include settings for the guardrail services.
218
218
219
219
| Service Name | Image Name | Gaudi Specific | Uses LLM |
| chatqna-gaudi-backend-server | opea/chatqna-guardrails:latest| No | No |
229
+
| chatqna-gaudi-backend-server | opea/chatqna:latest| No | No |
230
230
| chatqna-gaudi-ui-server | opea/chatqna-ui:latest| No | No |
231
231
| chatqna-gaudi-nginx-server | opea/nginx:latest| No | No |
232
232
@@ -266,8 +266,6 @@ The table provides a comprehensive overview of the ChatQnA services utilized acr
266
266
| tgi-guardrails-service | ghcr.io/huggingface/tgi-gaudi:2.0.6 | Yes | Provides guardrails functionality, ensuring safe operations within defined limits. |
267
267
| guardrails | opea/guardrails:latest| Yes | Acts as a safety layer, interfacing with the `tgi-guardrails-service` to enforce safety protocols. |
268
268
| chatqna-gaudi-backend-server | opea/chatqna:latest| No | Serves as the backend for the ChatQnA application, with variations depending on the deployment. |
269
-
|| opea/chatqna-without-rerank:latest|||
270
-
|| opea/chatqna-guardrails:latest|||
271
269
| chatqna-gaudi-ui-server | opea/chatqna-ui:latest| No | Provides the user interface for the ChatQnA application. |
272
270
| chatqna-gaudi-nginx-server | opea/nginx:latest| No | Acts as a reverse proxy, managing traffic between the UI and backend services. |
273
271
| jaeger | jaegertracing/all-in-one:latest| Yes | Provides tracing and monitoring capabilities for distributed systems. |
0 commit comments