diff --git a/.github/workflows/pr-link-path-scan.yml b/.github/workflows/pr-link-path-scan.yml index 77bf0d293f..fd888af4be 100644 --- a/.github/workflows/pr-link-path-scan.yml +++ b/.github/workflows/pr-link-path-scan.yml @@ -73,6 +73,7 @@ jobs: - name: Checking Relative Path Validity run: | + set -x cd ${{github.workspace}} fail="FALSE" repo_name=${{ github.event.pull_request.head.repo.full_name }} diff --git a/ChatQnA/README.md b/ChatQnA/README.md index 01d1c92b7a..bd188d3671 100644 --- a/ChatQnA/README.md +++ b/ChatQnA/README.md @@ -23,14 +23,15 @@ RAG bridges the knowledge gap by dynamically fetching relevant information from | Azure | 5th Gen Intel Xeon with Intel AMX | Work-in-progress | Work-in-progress | | Intel Tiber AI Cloud | 5th Gen Intel Xeon with Intel AMX | Work-in-progress | Work-in-progress | -## Automated Deployment to Ubuntu based system(if not using Terraform) using Intel® Optimized Cloud Modules for **Ansible** +## Automated Deployment to Ubuntu based system (if not using Terraform) using Intel® Optimized Cloud Modules for **Ansible** To deploy to existing Xeon Ubuntu based system, use our Intel Optimized Cloud Modules for Ansible. This is the same Ansible playbook used by Terraform. Use this if you are not using Terraform and have provisioned your system with another tool or manually including bare metal. -| Operating System | Intel Optimized Cloud Module for Ansible | -|------------------|------------------------------------------| -| Ubuntu 20.04 | [ChatQnA Ansible Module](https://github.com/intel/optimized-cloud-recipes/tree/main/recipes/ai-opea-chatqna-xeon) | -| Ubuntu 22.04 | Work-in-progress | + +| Operating System | Intel Optimized Cloud Module for Ansible | +| ---------------- | ----------------------------------------------------------------------------------------------------------------- | +| Ubuntu 20.04 | [ChatQnA Ansible Module](https://github.com/intel/optimized-cloud-recipes/tree/main/recipes/ai-opea-chatqna-xeon) | +| Ubuntu 22.04 | Work-in-progress | ## Manually Deploy ChatQnA Service @@ -48,7 +49,7 @@ Note: 1. If you do not have docker installed you can run this script to install docker : `bash docker_compose/install_docker.sh`. -2. The default LLM is `meta-llama/Meta-Llama-3-8B-Instruct`. Before deploying the application, please make sure either you've requested and been granted the access to it on [Huggingface](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct) or you've downloaded the model locally from [ModelScope](https://www.modelscope.cn/models). +2. The default LLM is `meta-llama/Meta-Llama-3-8B-Instruct`. Before deploying the application, please make sure either you've requested and been granted the access to it on [Huggingface](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct) `or` you've downloaded the model locally from [ModelScope](https://www.modelscope.cn/models). ### Quick Start: 1.Setup Environment Variable @@ -221,13 +222,14 @@ This ChatQnA use case performs RAG using LangChain, Redis VectorDB and Text Gene In the below, we provide a table that describes for each microservice component in the ChatQnA architecture, the default configuration of the open source project, hardware, port, and endpoint. Gaudi default compose.yaml -| MicroService | Open Source Project | HW | Port | Endpoint | + +| MicroService | Open Source Project | HW | Port | Endpoint | | ------------ | ------------------- | ----- | ---- | -------------------- | -| Embedding | Langchain | Xeon | 6000 | /v1/embeddings | -| Retriever | Langchain, Redis | Xeon | 7000 | /v1/retrieval | -| Reranking | Langchain, TEI | Gaudi | 8000 | /v1/reranking | -| LLM | Langchain, TGI | Gaudi | 9000 | /v1/chat/completions | -| Dataprep | Redis, Langchain | Xeon | 6007 | /v1/dataprep/ingest | +| Embedding | Langchain | Xeon | 6000 | /v1/embeddings | +| Retriever | Langchain, Redis | Xeon | 7000 | /v1/retrieval | +| Reranking | Langchain, TEI | Gaudi | 8000 | /v1/reranking | +| LLM | Langchain, TGI | Gaudi | 9000 | /v1/chat/completions | +| Dataprep | Redis, Langchain | Xeon | 6007 | /v1/dataprep/ingest | ### Required Models diff --git a/ChatQnA/assets/img/ui-result-page-faqgen.png b/ChatQnA/assets/img/ui-result-page-faqgen.png new file mode 100644 index 0000000000..ac0e654a83 Binary files /dev/null and b/ChatQnA/assets/img/ui-result-page-faqgen.png differ diff --git a/ChatQnA/assets/img/ui-result-page.png b/ChatQnA/assets/img/ui-result-page.png new file mode 100644 index 0000000000..186e6e6b69 Binary files /dev/null and b/ChatQnA/assets/img/ui-result-page.png differ diff --git a/ChatQnA/assets/img/ui-starting-page.png b/ChatQnA/assets/img/ui-starting-page.png new file mode 100644 index 0000000000..52d2414ee5 Binary files /dev/null and b/ChatQnA/assets/img/ui-starting-page.png differ diff --git a/ChatQnA/docker_compose/amd/gpu/rocm/README.md b/ChatQnA/docker_compose/amd/gpu/rocm/README.md index 02cd4ce41d..0edcf44141 100644 --- a/ChatQnA/docker_compose/amd/gpu/rocm/README.md +++ b/ChatQnA/docker_compose/amd/gpu/rocm/README.md @@ -1,459 +1,621 @@ -# Build and deploy ChatQnA Application on AMD GPU (ROCm) +# Build and Deploy ChatQnA Application on AMD GPU (ROCm) -## Build MegaService of ChatQnA on AMD ROCm GPU +## Build Docker Images -This document outlines the deployment process for a ChatQnA application utilizing the [GenAIComps](https://github.com/opea-project/GenAIComps.git) microservice pipeline on AMD ROCm GPU platform. The steps include Docker image creation, container deployment via Docker Compose, and service execution to integrate microservices such as embedding, retriever, rerank, and llm. We will publish the Docker images to Docker Hub, it will simplify the deployment process for this service. +### 1. Build Docker Image -Quick Start Deployment Steps: +- #### Create application install directory and go to it: -1. Set up the environment variables. -2. Run Docker Compose. -3. Consume the ChatQnA Service. + ```bash + mkdir ~/chatqna-install && cd chatqna-install + ``` -Note: The default LLM is `meta-llama/Meta-Llama-3-8B-Instruct`. Before deploying the application, please make sure either you've requested and been granted the access to it on [Huggingface](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct) or you've downloaded the model locally from [ModelScope](https://www.modelscope.cn/models). +- #### Clone the repository GenAIExamples (the default repository branch "main" is used here): -## Quick Start: 1.Setup Environment Variable + ```bash + git clone https://github.com/opea-project/GenAIExamples.git + ``` -To set up environment variables for deploying ChatQnA services, follow these steps: + If you need to use a specific branch/tag of the GenAIExamples repository, then (v1.3 replace with its own value): -1. Set the required environment variables: + ```bash + git clone https://github.com/opea-project/GenAIExamples.git && cd GenAIExamples && git checkout v1.3 + ``` - ```bash - # Example: host_ip="192.168.1.1" - export HOST_IP=${host_ip} - # Example: no_proxy="localhost, 127.0.0.1, 192.168.1.1" - export CHATQNA_HUGGINGFACEHUB_API_TOKEN=${your_hf_api_token} - ``` + We remind you that when using a specific version of the code, you need to use the README from this version: -2. If you are in a proxy environment, also set the proxy-related environment variables: +- #### Go to build directory: - ```bash - export http_proxy="Your_HTTP_Proxy" - export https_proxy="Your_HTTPs_Proxy" - ``` + ```bash + cd ~/chatqna-install/GenAIExamples/ChatQnA/docker_image_build + ``` -3. Set up other environment variables: +- Cleaning up the GenAIComps repository if it was previously cloned in this directory. + This is necessary if the build was performed earlier and the GenAIComps folder exists and is not empty: - ```bash - source ./set_env.sh - ``` + ```bash + echo Y | rm -R GenAIComps + ``` -## Quick Start: 2.Run Docker Compose +- #### Clone the repository GenAIComps (the default repository branch "main" is used here): -```bash -docker compose up -d -``` + ```bash + git clone https://github.com/opea-project/GenAIComps.git + ``` -It will automatically download the docker image on `docker hub`: + If you use a specific tag of the GenAIExamples repository, + then you should also use the corresponding tag for GenAIComps. (v1.3 replace with its own value): -```bash -docker pull opea/chatqna:latest -docker pull opea/chatqna-ui:latest -``` + ```bash + git clone https://github.com/opea-project/GenAIComps.git && cd GenAIComps && git checkout v1.3 + ``` -In following cases, you could build docker image from source by yourself. + We remind you that when using a specific version of the code, you need to use the README from this version. -- Failed to download the docker image. +- #### Setting the list of images for the build (from the build file.yaml) -- If you want to use a specific version of Docker image. + If you want to deploy a vLLM-based or TGI-based application, then the set of services is installed as follows: -Please refer to 'Build Docker Images' in below. + #### vLLM-based application -## QuickStart: 3.Consume the ChatQnA Service + ```bash + service_list="dataprep retriever vllm-rocm chatqna chatqna-ui nginx" + ``` -Prepare and upload test document + #### vLLM-based application with FaqGen -``` -# download pdf file -wget https://raw.githubusercontent.com/opea-project/GenAIComps/v1.1/comps/retrievers/redis/data/nke-10k-2023.pdf -# upload pdf file with dataprep -curl -X POST "http://${host_ip}:6007/v1/dataprep/ingest" \ - -H "Content-Type: multipart/form-data" \ - -F "files=@./nke-10k-2023.pdf" -``` + ```bash + service_list="dataprep retriever vllm-rocm llm-faqgen chatqna chatqna-ui nginx" + ``` -Get MegaSerice(backend) response: + #### TGI-based application -```bash -curl http://${host_ip}:8888/v1/chatqna \ - -H "Content-Type: application/json" \ - -d '{ - "messages": "What is the revenue of Nike in 2023?" - }' -``` + ```bash + service_list="dataprep retriever chatqna chatqna-ui nginx" + ``` -## 🚀 Build Docker Images + #### TGI-based application with FaqGen -First of all, you need to build Docker Images locally. This step can be ignored after the Docker images published to Docker hub. + ```bash + service_list="dataprep retriever llm-faqgen chatqna chatqna-ui nginx" + ``` -### 1. Source Code install GenAIComps +- #### Pull Docker Images -```bash -git clone https://github.com/opea-project/GenAIComps.git -cd GenAIComps -``` + ```bash + docker pull redis/redis-stack:7.2.0-v9 + docker pull ghcr.io/huggingface/text-embeddings-inference:cpu-1.5 + ``` -### 2. Build Retriever Image +- #### Optional. Pull TGI Docker Image (Do this if you want to use TGI) -```bash -docker build --no-cache -t opea/retriever:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/retrievers/src/Dockerfile . + ```bash + docker pull ghcr.io/huggingface/text-generation-inference:2.3.1-rocm + ``` + +- #### Build Docker Images + + ```bash + docker compose -f build.yaml build ${service_list} --no-cache + ``` + + After the build, we check the list of images with the command: + + ```bash + docker image ls + ``` + + The list of images should include: + + ##### vLLM-based application: + + - redis/redis-stack:7.2.0-v9 + - opea/dataprep:latest + - ghcr.io/huggingface/text-embeddings-inference:cpu-1.5 + - opea/retriever:latest + - opea/vllm-rocm:latest + - opea/chatqna:latest + - opea/chatqna-ui:latest + - opea/nginx:latest + + ##### vLLM-based application with FaqGen: + + - redis/redis-stack:7.2.0-v9 + - opea/dataprep:latest + - ghcr.io/huggingface/text-embeddings-inference:cpu-1.5 + - opea/retriever:latest + - opea/vllm-rocm:latest + - opea/llm-faqgen:latest + - opea/chatqna:latest + - opea/chatqna-ui:latest + - opea/nginx:latest + + ##### TGI-based application: + + - redis/redis-stack:7.2.0-v9 + - opea/dataprep:latest + - ghcr.io/huggingface/text-embeddings-inference:cpu-1.5 + - opea/retriever:latest + - ghcr.io/huggingface/text-generation-inference:2.3.1-rocm + - opea/chatqna:latest + - opea/chatqna-ui:latest + - opea/nginx:latest + + ##### TGI-based application with FaqGen: + + - redis/redis-stack:7.2.0-v9 + - opea/dataprep:latest + - ghcr.io/huggingface/text-embeddings-inference:cpu-1.5 + - opea/retriever:latest + - ghcr.io/huggingface/text-generation-inference:2.3.1-rocm + - opea/llm-faqgen:latest + - opea/chatqna:latest + - opea/chatqna-ui:latest + - opea/nginx:latest + +--- + +## Deploy the ChatQnA Application + +### Docker Compose Configuration for AMD GPUs + +To enable GPU support for AMD GPUs, the following configuration is added to the Docker Compose file: + +- compose_vllm.yaml - for vLLM-based application +- compose_faqgen_vllm.yaml - for vLLM-based application with FaqGen +- compose.yaml - for TGI-based +- compose_faqgen.yaml - for TGI-based application with FaqGen + +```yaml +shm_size: 1g +devices: + - /dev/kfd:/dev/kfd + - /dev/dri:/dev/dri +cap_add: + - SYS_PTRACE +group_add: + - video +security_opt: + - seccomp:unconfined ``` -### 3. Build Dataprep Image +This configuration forwards all available GPUs to the container. To use a specific GPU, specify its `cardN` and `renderN` device IDs. For example: -```bash -docker build --no-cache -t opea/dataprep:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/dataprep/src/Dockerfile . +```yaml +shm_size: 1g +devices: + - /dev/kfd:/dev/kfd + - /dev/dri/card0:/dev/dri/card0 + - /dev/dri/render128:/dev/dri/render128 +cap_add: + - SYS_PTRACE +group_add: + - video +security_opt: + - seccomp:unconfined ``` -### 4. Build FaqGen LLM Image (Optional) +**How to Identify GPU Device IDs:** +Use AMD GPU driver utilities to determine the correct `cardN` and `renderN` IDs for your GPU. + +### Set deploy environment variables + +#### Setting variables in the operating system environment: -If you want to enable FAQ generation LLM in the pipeline, please use the below command: +##### Set variable HUGGINGFACEHUB_API_TOKEN: ```bash -docker build -t opea/llm-faqgen:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/llms/src/faq-generation/Dockerfile . +### Replace the string 'your_huggingfacehub_token' with your HuggingFacehub repository access token. +export HUGGINGFACEHUB_API_TOKEN='your_huggingfacehub_token' ``` -### 5. Build MegaService Docker Image +#### Set variables value in set_env\*\*\*\*.sh file: -To construct the Mega Service, we utilize the [GenAIComps](https://github.com/opea-project/GenAIComps.git) microservice pipeline within the `chatqna.py` Python script. Build the MegaService Docker image using the command below: +Go to Docker Compose directory: ```bash -git clone https://github.com/opea-project/GenAIExamples.git -cd GenAIExamples/ChatQnA/docker -docker build --no-cache -t opea/chatqna:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f Dockerfile . -cd ../../.. +cd ~/chatqna-install/GenAIExamples/ChatQnA/docker_compose/amd/gpu/rocm ``` -### 6. Build UI Docker Image +The example uses the Nano text editor. You can use any convenient text editor: -Construct the frontend Docker image using the command below: +#### If you use vLLM based application ```bash -cd GenAIExamples/ChatQnA/ui -docker build --no-cache -t opea/chatqna-ui:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f ./docker/Dockerfile . -cd ../../../.. +nano set_env_vllm.sh ``` -### 7. Build React UI Docker Image (Optional) - -Construct the frontend Docker image using the command below: +#### If you use vLLM based application with FaqGen ```bash -cd GenAIExamples/ChatQnA/ui -docker build --no-cache -t opea/chatqna-react-ui:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f ./docker/Dockerfile.react . -cd ../../../.. +nano set_env_vllm_faqgen.sh ``` -### 8. Build Nginx Docker Image +#### If you use TGI based application ```bash -cd GenAIComps -docker build -t opea/nginx:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/third_parties/nginx/src/Dockerfile . +nano set_env.sh ``` -Then run the command `docker images`, you will have the following 5 Docker Images: - -1. `opea/retriever:latest` -2. `opea/dataprep:latest` -3. `opea/chatqna:latest` -4. `opea/chatqna-ui:latest` or `opea/chatqna-react-ui:latest` -5. `opea/nginx:latest` +#### If you use TGI based application with FaqGen -If FaqGen docker image is built, you will find one more image: +```bash +nano set_env_faqgen.sh +``` -- `opea/llm-faqgen:latest` +If you are in a proxy environment, also set the proxy-related environment variables: -## 🚀 Start MicroServices and MegaService +```bash +export http_proxy="Your_HTTP_Proxy" +export https_proxy="Your_HTTPs_Proxy" +``` -### Required Models +Set the values of the variables: -By default, the embedding, reranking and LLM models are set to a default value as listed below: +- **HOST_IP, HOST_IP_EXTERNAL** - These variables are used to configure the name/address of the service in the operating system environment for the application services to interact with each other and with the outside world. -| Service | Model | -| --------- | ----------------------------------- | -| Embedding | BAAI/bge-base-en-v1.5 | -| Reranking | BAAI/bge-reranker-base | -| LLM | meta-llama/Meta-Llama-3-8B-Instruct | + If your server uses only an internal address and is not accessible from the Internet, then the values for these two variables will be the same and the value will be equal to the server's internal name/address. -Change the `xxx_MODEL_ID` below for your needs. + If your server uses only an external, Internet-accessible address, then the values for these two variables will be the same and the value will be equal to the server's external name/address. -### Setup Environment Variables + If your server is located on an internal network, has an internal address, but is accessible from the Internet via a proxy/firewall/load balancer, then the HOST_IP variable will have a value equal to the internal name/address of the server, and the EXTERNAL_HOST_IP variable will have a value equal to the external name/address of the proxy/firewall/load balancer behind which the server is located. -1. Set the required environment variables: + We set these values in the file set_env\*\*\*\*.sh - ```bash - # Example: host_ip="192.168.1.1" - export host_ip="External_Public_IP" - # Example: no_proxy="localhost, 127.0.0.1, 192.168.1.1" - export no_proxy="Your_No_Proxy" - export CHATQNA_HUGGINGFACEHUB_API_TOKEN="Your_Huggingface_API_Token" - # Example: NGINX_PORT=80 - export HOST_IP=${host_ip} - export NGINX_PORT=${your_nginx_port} - export CHATQNA_TGI_SERVICE_IMAGE="ghcr.io/huggingface/text-generation-inference:2.4.1-rocm" - export CHATQNA_EMBEDDING_MODEL_ID="BAAI/bge-base-en-v1.5" - export CHATQNA_RERANK_MODEL_ID="BAAI/bge-reranker-base" - export CHATQNA_LLM_MODEL_ID="meta-llama/Meta-Llama-3-8B-Instruct" - export CHATQNA_TGI_SERVICE_PORT=8008 - export CHATQNA_TEI_EMBEDDING_PORT=8090 - export CHATQNA_TEI_EMBEDDING_ENDPOINT="http://${HOST_IP}:${CHATQNA_TEI_EMBEDDING_PORT}" - export CHATQNA_TEI_RERANKING_PORT=8808 - export CHATQNA_REDIS_VECTOR_PORT=16379 - export CHATQNA_REDIS_VECTOR_INSIGHT_PORT=8001 - export CHATQNA_REDIS_DATAPREP_PORT=6007 - export CHATQNA_REDIS_RETRIEVER_PORT=7000 - export CHATQNA_LLM_FAQGEN_PORT=9000 - export CHATQNA_INDEX_NAME="rag-redis" - export CHATQNA_MEGA_SERVICE_HOST_IP=${HOST_IP} - export CHATQNA_RETRIEVER_SERVICE_HOST_IP=${HOST_IP} - export CHATQNA_BACKEND_SERVICE_ENDPOINT="http://127.0.0.1:${CHATQNA_BACKEND_SERVICE_PORT}/v1/chatqna" - export CHATQNA_DATAPREP_SERVICE_ENDPOINT="http://127.0.0.1:${CHATQNA_REDIS_DATAPREP_PORT}/v1/dataprep/ingest" - export CHATQNA_DATAPREP_GET_FILE_ENDPOINT="http://127.0.0.1:${CHATQNA_REDIS_DATAPREP_PORT}/v1/dataprep/get" - export CHATQNA_DATAPREP_DELETE_FILE_ENDPOINT="http://127.0.0.1:${CHATQNA_REDIS_DATAPREP_PORT}/v1/dataprep/delete" - export CHATQNA_FRONTEND_SERVICE_IP=${HOST_IP} - export CHATQNA_FRONTEND_SERVICE_PORT=5173 - export CHATQNA_BACKEND_SERVICE_NAME=chatqna - export CHATQNA_BACKEND_SERVICE_IP=${HOST_IP} - export CHATQNA_BACKEND_SERVICE_PORT=8888 - export CHATQNA_REDIS_URL="redis://${HOST_IP}:${CHATQNA_REDIS_VECTOR_PORT}" - export CHATQNA_EMBEDDING_SERVICE_HOST_IP=${HOST_IP} - export CHATQNA_RERANK_SERVICE_HOST_IP=${HOST_IP} - export CHATQNA_LLM_SERVICE_HOST_IP=${HOST_IP} - export CHATQNA_NGINX_PORT=5176 - ``` +- **Variables with names like "**\*\*\*\*\*\*\_PORT"\*\* - These variables set the IP port numbers for establishing network connections to the application services. + The values shown in the file set_env.sh or set_env_vllm they are the values used for the development and testing of the application, as well as configured for the environment in which the development is performed. These values must be configured in accordance with the rules of network access to your environment's server, and must not overlap with the IP ports of other applications that are already in use. -2. If you are in a proxy environment, also set the proxy-related environment variables: +#### Set variables with script set_env\*\*\*\*.sh - ```bash - export http_proxy="Your_HTTP_Proxy" - export https_proxy="Your_HTTPs_Proxy" - ``` +#### If you use vLLM based application -3. Note: In order to limit access to a subset of GPUs, please pass each device individually using one or more -device /dev/dri/rendered, where is the card index, starting from 128. (https://rocm.docs.amd.com/projects/install-on-linux/en/latest/how-to/docker.html#docker-restrict-gpus) into tgi-service in compose.yaml file +```bash +. set_env_vllm.sh +``` -Example for set isolation for 1 GPU +#### If you use vLLM based application with FaqGen +```bash +. set_env_faqgen_vllm.sh ``` - - /dev/dri/card0:/dev/dri/card0 - - /dev/dri/renderD128:/dev/dri/renderD128 + +#### If you use TGI based application + +```bash +. set_env.sh ``` -Example for set isolation for 2 GPUs +#### If you use TGI based application with FaqGen +```bash +. set_env_faqgen.sh ``` - - /dev/dri/card0:/dev/dri/card0 - - /dev/dri/renderD128:/dev/dri/renderD128 - - /dev/dri/card1:/dev/dri/card1 - - /dev/dri/renderD129:/dev/dri/renderD129 + +### Start the services: + +#### If you use vLLM based application + +```bash +docker compose -f compose_vllm.yaml up -d ``` -Please find more information about accessing and restricting AMD GPUs in the link (https://rocm.docs.amd.com/projects/install-on-linux/en/latest/how-to/docker.html#docker-restrict-gpus) +#### If you use vLLM based application with FaqGen -4. Set up other environment variables: +```bash +docker compose -f compose_faqgen_vllm.yaml up -d +``` - ```bash - source ./set_env.sh - ``` +#### If you use TGI based application + +```bash +docker compose -f compose.yaml up -d +``` -### Start all the services Docker Containers +#### If you use TGI based application with FaqGen ```bash -cd GenAIExamples/ChatQnA/docker_compose/amd/gpu/rocm -## for text generation -docker compose up -d -## for FAQ generation docker compose -f compose_faqgen.yaml up -d ``` -### Validate MicroServices and MegaService +All containers should be running and should not restart: + +##### If you use vLLM based application: + +- chatqna-redis-vector-db +- chatqna-dataprep-service +- chatqna-tei-embedding-service +- chatqna-retriever +- chatqna-tei-reranking-service +- chatqna-vllm-service +- chatqna-backend-server +- chatqna-ui-server +- chatqna-nginx-server + +##### If you use vLLM based application with FaqGen: + +- chatqna-redis-vector-db +- chatqna-dataprep-service +- chatqna-tei-embedding-service +- chatqna-retriever +- chatqna-tei-reranking-service +- chatqna-vllm-service +- chatqna-llm-faqgen +- chatqna-backend-server +- chatqna-ui-server +- chatqna-nginx-server + +##### If you use TGI based application: + +- chatqna-redis-vector-db +- chatqna-dataprep-service +- chatqna-tei-embedding-service +- chatqna-retriever +- chatqna-tei-reranking-service +- chatqna-tgi-service +- chatqna-backend-server +- chatqna-ui-server +- chaqna-nginx-server + +##### If you use TGI based application with FaqGen: + +- chatqna-redis-vector-db +- chatqna-dataprep-service +- chatqna-tei-embedding-service +- chatqna-retriever +- chatqna-tei-reranking-service +- chatqna-tgi-service +- chatqna-llm-faqgen +- chatqna-backend-server +- chatqna-ui-server +- chaqna-nginx-server + +--- + +## Validate the Services + +### 1. Validate TEI Embedding Service -1. TEI Embedding Service - - ```bash - curl ${host_ip}:8090/embed \ - -X POST \ - -d '{"inputs":"What is Deep Learning?"}' \ - -H 'Content-Type: application/json' - ``` +```bash +curl http://${HOST_IP}:${CHATQNA_TEI_EMBEDDING_PORT}/embed \ + -X POST \ + -d '{"inputs":"What is Deep Learning?"}' \ + -H 'Content-Type: application/json' +``` -2. Retriever Microservice +Checking the response from the service. The response should be similar to text: - To consume the retriever microservice, you need to generate a mock embedding vector by Python script. The length of embedding vector - is determined by the embedding model. - Here we use the model `EMBEDDING_MODEL_ID="BAAI/bge-base-en-v1.5"`, which vector size is 768. +```textmate +[[0.00037115702,-0.06356819,0.0024758505,..................,0.022725677,0.016026087,-0.02125421,-0.02984927,-0.0049473033]] +``` - Check the vecotor dimension of your embedding model, set `your_embedding` dimension equals to it. +If the service response has a meaningful response in the value, +then we consider the TEI Embedding Service to be successfully launched - ```bash - export your_embedding=$(python3 -c "import random; embedding = [random.uniform(-1, 1) for _ in range(768)]; print(embedding)") - curl http://${host_ip}:7000/v1/retrieval \ - -X POST \ - -d "{\"text\":\"test\",\"embedding\":${your_embedding}}" \ - -H 'Content-Type: application/json' - ``` +### 2. Validate Retriever Microservice -3. TEI Reranking Service +```bash +export your_embedding=$(python3 -c "import random; embedding = [random.uniform(-1, 1) for _ in range(768)]; print(embedding)") +curl http://${HOST_IP}:${CHATQNA_REDIS_RETRIEVER_PORT}/v1/retrieval \ + -X POST \ + -d "{\"text\":\"test\",\"embedding\":${your_embedding}}" \ + -H 'Content-Type: application/json' +``` - ```bash - curl http://${host_ip}:8808/rerank \ - -X POST \ - -d '{"query":"What is Deep Learning?", "texts": ["Deep Learning is not...", "Deep learning is..."]}' \ - -H 'Content-Type: application/json' - ``` +Checking the response from the service. The response should be similar to JSON: -4. TGI Service +```json +{ "id": "e191846168aed1f80b2ea12df80844d2", "retrieved_docs": [], "initial_query": "test", "top_n": 1, "metadata": [] } +``` - In first startup, this service will take more time to download the model files. After it's finished, the service will be ready. +If the response corresponds to the form of the provided JSON, then we consider the +Retriever Microservice verification successful. - Try the command below to check whether the TGI service is ready. +### 3. Validate TEI Reranking Service - ```bash - docker logs chatqna-tgi-server | grep Connected - ``` +```bash +curl http://${HOST_IP}:${CHATQNA_TEI_RERANKING_PORT}/rerank \ + -X POST \ + -d '{"query":"What is Deep Learning?", "texts": ["Deep Learning is not...", "Deep learning is..."]}' \ + -H 'Content-Type: application/json' +``` - If the service is ready, you will get the response like below. +Checking the response from the service. The response should be similar to JSON: - ``` - 2024-09-03T02:47:53.402023Z INFO text_generation_router::server: router/src/server.rs:2311: Connected - ``` +```json +[ + { "index": 1, "score": 0.94238955 }, + { "index": 0, "score": 0.120219156 } +] +``` - Then try the `cURL` command below to validate TGI. +If the response corresponds to the form of the provided JSON, then we consider the TEI Reranking Service +verification successful. - ```bash - curl http://${host_ip}:8008/generate \ - -X POST \ - -d '{"inputs":"What is Deep Learning?","parameters":{"max_new_tokens":64, "do_sample": true}}' \ - -H 'Content-Type: application/json' - ``` +### 4. Validate the vLLM/TGI Service -5. FaqGen LLM Microservice (if enabled) +#### If you use vLLM: ```bash -curl http://${host_ip}:${CHATQNA_LLM_FAQGEN_PORT}/v1/faqgen \ +DATA='{"model": "meta-llama/Meta-Llama-3-8B-Instruct", '\ +'"messages": [{"role": "user", "content": "What is a Deep Learning?"}], "max_tokens": 64}' + +curl http://${HOST_IP}:${CHATQNA_VLLM_SERVICE_PORT}/v1/chat/completions \ -X POST \ - -d '{"query":"Text Embeddings Inference (TEI) is a toolkit for deploying and serving open source text embeddings and sequence classification models. TEI enables high-performance extraction for the most popular models, including FlagEmbedding, Ember, GTE and E5."}' \ + -d "$DATA" \ -H 'Content-Type: application/json' ``` -6. MegaService - - ```bash - curl http://${host_ip}:8888/v1/chatqna -H "Content-Type: application/json" -d '{ - "messages": "What is the revenue of Nike in 2023?" - }' - ``` +Checking the response from the service. The response should be similar to JSON: + +```json +{ + "id": "chatcmpl-91003647d1c7469a89e399958f390f67", + "object": "chat.completion", + "created": 1742877228, + "model": "meta-llama/Meta-Llama-3-8B-Instruct", + "choices": [ + { + "index": 0, + "message": { + "role": "assistant", + "content": "Deep Learning ( DL) is a subfield of Machine Learning (ML) that focuses on the design of algorithms and architectures inspired by the structure and function of the human brain. These algorithms are designed to analyze and interpret data that is presented in the form of patterns or signals, and they often mimic the way the human brain", + "tool_calls": [] + }, + "logprobs": null, + "finish_reason": "length", + "stop_reason": null + } + ], + "usage": { "prompt_tokens": 16, "total_tokens": 80, "completion_tokens": 64, "prompt_tokens_details": null }, + "prompt_logprobs": null +} +``` -7. Nginx Service +If the service response has a meaningful response in the value of the "choices.message.content" key, +then we consider the vLLM service to be successfully launched - ```bash - curl http://${host_ip}:${NGINX_PORT}/v1/chatqna \ - -H "Content-Type: application/json" \ - -d '{"messages": "What is the revenue of Nike in 2023?"}' - ``` +#### If you use TGI: -8. Dataprep Microservice(Optional) +```bash +DATA='{"inputs":"What is a Deep Learning?",'\ +'"parameters":{"max_new_tokens":64,"do_sample": true}}' -If you want to update the default knowledge base, you can use the following commands: +curl http://${HOST_IP}:${CHATQNA_TGI_SERVICE_PORT}/generate \ + -X POST \ + -d "$DATA" \ + -H 'Content-Type: application/json' +``` -Update Knowledge Base via Local File Upload: +Checking the response from the service. The response should be similar to JSON: -```bash -curl -X POST "http://${host_ip}:6007/v1/dataprep/ingest" \ - -H "Content-Type: multipart/form-data" \ - -F "files=@./nke-10k-2023.pdf" +```json +{ + "generated_text": " What is its application in Computer Vision?\nWhat is a Deep Learning?\nDeep learning is a subfield of machine learning that involves the use of artificial neural networks to model high-level abstractions in data. It involves the use of deep neural networks, which are composed of multiple layers, to learn complex patterns in data. The" +} ``` -This command updates a knowledge base by uploading a local file for processing. Update the file path according to your environment. +If the service response has a meaningful response in the value of the "generated_text" key, +then we consider the TGI service to be successfully launched -Add Knowledge Base via HTTP Links: +### 5. Validate the LLM Service (if your used application with FaqGen) ```bash -curl -X POST "http://${host_ip}:6007/v1/dataprep/ingest" \ - -H "Content-Type: multipart/form-data" \ - -F 'link_list=["https://opea.dev"]' -``` +DATA='{"messages":"Text Embeddings Inference (TEI) is a toolkit for deploying and serving open source '\ +'text embeddings and sequence classification models. TEI enables high-performance extraction for the most '\ +'popular models, including FlagEmbedding, Ember, GTE and E5.","max_tokens": 128}' -This command updates a knowledge base by submitting a list of HTTP links for processing. +curl http://${HOST_IP}:${CHATQNA_LLM_FAQGEN_PORT}/v1/faqgen \ + -X POST \ + -d "$DATA" \ + -H 'Content-Type: application/json' +``` -Also, you are able to get the file list that you uploaded: +Checking the response from the service. The response should be similar to JSON: -```bash -curl -X POST "http://${host_ip}:6007/v1/dataprep/get" \ - -H "Content-Type: application/json" +```json +{ + "id": "58f0632f5f03af31471b895b0d0d397b", + "text": " Q: What is Text Embeddings Inference (TEI)?\n A: TEI is a toolkit for deploying and serving open source text embeddings and sequence classification models.\n\n Q: What models does TEI support?\n A: TEI enables high-performance extraction for the most popular models, including FlagEmbedding, Ember, GTE and E5.\n\n Q: What is the purpose of TEI?\n A: The purpose of TEI is to enable high-performance extraction for text embeddings and sequence classification models.\n\n Q: What are the benefits of using TEI?\n A: The benefits of using TEI include high", + "prompt": "Text Embeddings Inference (TEI) is a toolkit for deploying and serving open source text embeddings and sequence classification models. TEI enables high-performance extraction for the most popular models, including FlagEmbedding, Ember, GTE and E5." +} ``` -To delete the file/link you uploaded: +If the service response has a meaningful response in the value of the "text" key, +then we consider the LLM service to be successfully launched + +### 6. Validate the MegaService ```bash -# delete link -curl -X POST "http://${host_ip}:6007/v1/dataprep/delete" \ - -d '{"file_path": "https://opea.dev"}' \ - -H "Content-Type: application/json" +curl http://${HOST_IP}:${CHATQNA_BACKEND_SERVICE_PORT}/v1/chatqna \ + -H "Content-Type: application/json" \ + -d '{"messages": "What is the revenue of Nike in 2023?"}' +``` -# delete file -curl -X POST "http://${host_ip}:6007/v1/dataprep/delete" \ - -d '{"file_path": "nke-10k-2023.pdf"}' \ - -H "Content-Type: application/json" +Checking the response from the service. The response should be similar to text: + +```textmate +data: b' What' +data: b' is' +data: b' the' +data: b' revenue' +data: b' of' +data: b' Nike' +data: b' in' +data: b' ' +data: b'202' +data: b'3' +data: b'?\n' +data: b' ' +data: b' Answer' +data: b':' +data: b' According' +data: b' to' +data: b' the' +data: b' search' +data: b' results' +data: b',' +data: b' the' +data: b' revenue' +data: b' of' +data: b'' + +data: [DONE] -# delete all uploaded files and links -curl -X POST "http://${host_ip}:6007/v1/dataprep/delete" \ - -d '{"file_path": "all"}' \ - -H "Content-Type: application/json" ``` -## 🚀 Launch the UI +If the output lines in the "data" keys contain words (tokens) containing meaning, then the service +is considered launched successfully. -### Launch with origin port +### 7. Validate the Frontend (UI) -To access the frontend, open the following URL in your browser: http://{host_ip}:5173. By default, the UI runs on port 5173 internally. If you prefer to use a different host port to access the frontend, you can modify the port mapping in the `compose.yaml` file as shown below: +To access the UI, use the URL - http://${EXTERNAL_HOST_IP}:${CHATQNA_NGINX_PORT} +A page should open when you click through to this address: -```yaml - chaqna-ui-server: - image: opea/chatqna-ui:latest - ... - ports: - - "80:5173" -``` +![UI start page](../../../../assets/img/ui-starting-page.png) -### Launch with Nginx +If a page of this type has opened, then we believe that the service is running and responding, +and we can proceed to functional UI testing. -If you want to launch the UI using Nginx, open this URL: `http://${host_ip}:${NGINX_PORT}` in your browser to access the frontend. +Let's enter the task for the service in the "Enter prompt here" field. +For example, "What is a Deep Learning?" and press Enter. +After that, a page with the result of the task should open: -## 🚀 Launch the Conversational UI (Optional) +#### If used application without FaqGen -To access the Conversational UI (react based) frontend, modify the UI service in the `compose.yaml` file. Replace `chaqna-ui-server` service with the `chatqna-react-ui-server` service as per the config below: +![UI result page](../../../../assets/img/ui-result-page.png) -```yaml -chatqna-react-ui-server: - image: opea/chatqna-react-ui:latest - container_name: chatqna-react-ui-server - environment: - - APP_BACKEND_SERVICE_ENDPOINT=${BACKEND_SERVICE_ENDPOINT} - - APP_DATA_PREP_SERVICE_URL=${DATAPREP_SERVICE_ENDPOINT} - ports: - - "5174:80" - depends_on: - - chaqna-backend-server - ipc: host - restart: always -``` - -Once the services are up, open the following URL in your browser: http://{host_ip}:5174. By default, the UI runs on port 80 internally. If you prefer to use a different host port to access the frontend, you can modify the port mapping in the `compose.yaml` file as shown below: +#### If used application with FaqGen -```yaml - chaqna-react-ui-server: - image: opea/chatqna-react-ui:latest - ... - ports: - - "80:80" +![UI result page](../../../../assets/img/ui-result-page-faqgen.png) + +If the result shown on the page is correct, then we consider the verification of the UI service to be successful. + +### 5. Stop application + +#### If you use vLLM + +```bash +cd ~/chatqna-install/GenAIExamples/ChatQnA/docker_compose/amd/gpu/rocm +docker compose -f compose_vllm.yaml down ``` -![project-screenshot](../../../../assets/img/chat_ui_init.png) +#### If you use vLLM with FaqGen + +```bash +cd ~/chatqna-install/GenAIExamples/ChatQnA/docker_compose/amd/gpu/rocm +docker compose -f compose_faqgen_vllm.yaml down +``` -Here is an example of running ChatQnA: +#### If you use TGI -![project-screenshot](../../../../assets/img/chat_ui_response.png) +```bash +cd ~/chatqna-install/GenAIExamples/ChatQnA/docker_compose/amd/gpu/rocm +docker compose -f compose.yaml down +``` -Here is an example of running ChatQnA with Conversational UI (React): +#### If you use TGI with FaqGen -![project-screenshot](../../../../assets/img/conversation_ui_response.png) +```bash +cd ~/chatqna-install/GenAIExamples/ChatQnA/docker_compose/amd/gpu/rocm +docker compose -f compose_faqgen.yaml down +``` diff --git a/ChatQnA/docker_compose/amd/gpu/rocm/compose.yaml b/ChatQnA/docker_compose/amd/gpu/rocm/compose.yaml index da1f4ddda4..2a203ddf1e 100644 --- a/ChatQnA/docker_compose/amd/gpu/rocm/compose.yaml +++ b/ChatQnA/docker_compose/amd/gpu/rocm/compose.yaml @@ -4,13 +4,14 @@ services: chatqna-redis-vector-db: image: redis/redis-stack:7.2.0-v9 - container_name: redis-vector-db + container_name: chatqna-redis-vector-db ports: - "${CHATQNA_REDIS_VECTOR_PORT}:6379" - "${CHATQNA_REDIS_VECTOR_INSIGHT_PORT}:8001" - chatqna-dataprep-redis-service: + + chatqna-dataprep-service: image: ${REGISTRY:-opea}/dataprep:${TAG:-latest} - container_name: dataprep-redis-server + container_name: chatqna-dataprep-service depends_on: - chatqna-redis-vector-db - chatqna-tei-embedding-service @@ -24,13 +25,14 @@ services: INDEX_NAME: ${CHATQNA_INDEX_NAME} TEI_ENDPOINT: ${CHATQNA_TEI_EMBEDDING_ENDPOINT} HUGGINGFACEHUB_API_TOKEN: ${CHATQNA_HUGGINGFACEHUB_API_TOKEN} + chatqna-tei-embedding-service: image: ghcr.io/huggingface/text-embeddings-inference:cpu-1.5 - container_name: chatqna-tei-embedding-server + container_name: chatqna-tei-embedding-service ports: - "${CHATQNA_TEI_EMBEDDING_PORT}:80" volumes: - - "${MODEL_CACHE:-/var/opea/chatqna-service/data}:/data" + - "${MODEL_CACHE:-./data}:/data" shm_size: 1g ipc: host environment: @@ -38,19 +40,10 @@ services: http_proxy: ${http_proxy} https_proxy: ${https_proxy} command: --model-id ${CHATQNA_EMBEDDING_MODEL_ID} --auto-truncate - devices: - - /dev/kfd:/dev/kfd - - /dev/dri/card1:/dev/dri/card1 - - /dev/dri/renderD136:/dev/dri/renderD136 - cap_add: - - SYS_PTRACE - group_add: - - video - security_opt: - - seccomp:unconfined + chatqna-retriever: image: ${REGISTRY:-opea}/retriever:${TAG:-latest} - container_name: chatqna-retriever-redis-server + container_name: chatqna-retriever depends_on: - chatqna-redis-vector-db ports: @@ -63,16 +56,18 @@ services: REDIS_URL: ${CHATQNA_REDIS_URL} INDEX_NAME: ${CHATQNA_INDEX_NAME} TEI_EMBEDDING_ENDPOINT: ${CHATQNA_TEI_EMBEDDING_ENDPOINT} + HUGGINGFACEHUB_API_TOKEN: ${CHATQNA_HUGGINGFACEHUB_API_TOKEN} LOGFLAG: ${LOGFLAG} RETRIEVER_COMPONENT_NAME: "OPEA_RETRIEVER_REDIS" restart: unless-stopped + chatqna-tei-reranking-service: image: ghcr.io/huggingface/text-embeddings-inference:cpu-1.5 - container_name: chatqna-tei-reranking-server + container_name: chatqna-tei-reranking-service ports: - "${CHATQNA_TEI_RERANKING_PORT}:80" volumes: - - "${MODEL_CACHE:-/var/opea/chatqna-service/data}:/data" + - "${MODEL_CACHE:-./data}:/data" shm_size: 1g environment: no_proxy: ${no_proxy} @@ -81,19 +76,11 @@ services: HF_API_TOKEN: ${CHATQNA_HUGGINGFACEHUB_API_TOKEN} HF_HUB_DISABLE_PROGRESS_BARS: 1 HF_HUB_ENABLE_HF_TRANSFER: 0 - devices: - - /dev/kfd:/dev/kfd - - /dev/dri/:/dev/dri/ - cap_add: - - SYS_PTRACE - group_add: - - video - security_opt: - - seccomp:unconfined command: --model-id ${CHATQNA_RERANK_MODEL_ID} --auto-truncate + chatqna-tgi-service: - image: ${CHATQNA_TGI_SERVICE_IMAGE} - container_name: chatqna-tgi-server + image: ghcr.io/huggingface/text-generation-inference:2.3.1-rocm + container_name: chatqna-tgi-service ports: - "${CHATQNA_TGI_SERVICE_PORT}:80" environment: @@ -104,11 +91,11 @@ services: HF_HUB_DISABLE_PROGRESS_BARS: 1 HF_HUB_ENABLE_HF_TRANSFER: 0 volumes: - - "${MODEL_CACHE:-/var/opea/chatqna-service/data}:/data" + - "${MODEL_CACHE:-./data}:/data" shm_size: 1g devices: - /dev/kfd:/dev/kfd - - /dev/dri/:/dev/dri/ + - /dev/dri:/dev/dri cap_add: - SYS_PTRACE group_add: @@ -117,6 +104,7 @@ services: - seccomp:unconfined command: --model-id ${CHATQNA_LLM_MODEL_ID} ipc: host + chatqna-backend-server: image: ${REGISTRY:-opea}/chatqna:${TAG:-latest} container_name: chatqna-backend-server @@ -127,39 +115,41 @@ services: - chatqna-tei-reranking-service - chatqna-tgi-service ports: - - "${CHATQNA_BACKEND_SERVICE_PORT}:8888" + - "${CHATQNA_BACKEND_SERVICE_PORT:-8888}:8888" environment: - - no_proxy=${no_proxy} - - https_proxy=${https_proxy} - - http_proxy=${http_proxy} - - MEGA_SERVICE_HOST_IP=${CHATQNA_MEGA_SERVICE_HOST_IP} - - EMBEDDING_SERVER_HOST_IP=${HOST_IP} - - EMBEDDING_SERVER_PORT=${CHATQNA_TEI_EMBEDDING_PORT:-80} - - RETRIEVER_SERVICE_HOST_IP=${HOST_IP} - - RERANK_SERVER_HOST_IP=${HOST_IP} - - RERANK_SERVER_PORT=${CHATQNA_TEI_RERANKING_PORT:-80} - - LLM_SERVER_HOST_IP=${HOST_IP} - - LLM_SERVER_PORT=${CHATQNA_TGI_SERVICE_PORT:-80} - - LLM_MODEL=${CHATQNA_LLM_MODEL_ID} + no_proxy: ${no_proxy} + https_proxy: ${https_proxy} + http_proxy: ${http_proxy} + MEGA_SERVICE_HOST_IP: ${CHATQNA_MEGA_SERVICE_HOST_IP} + EMBEDDING_SERVER_HOST_IP: ${HOST_IP} + EMBEDDING_SERVER_PORT: ${CHATQNA_TEI_EMBEDDING_PORT:-80} + RETRIEVER_SERVICE_HOST_IP: ${HOST_IP} + RERANK_SERVER_HOST_IP: ${HOST_IP} + RERANK_SERVER_PORT: ${CHATQNA_TEI_RERANKING_PORT:-80} + LLM_SERVER_HOST_IP: ${HOST_IP} + LLM_SERVER_PORT: ${CHATQNA_TGI_SERVICE_PORT:-80} + LLM_MODEL: ${CHATQNA_LLM_MODEL_ID} ipc: host restart: always + chatqna-ui-server: image: ${REGISTRY:-opea}/chatqna-ui:${TAG:-latest} container_name: chatqna-ui-server depends_on: - chatqna-backend-server ports: - - "${CHATQNA_FRONTEND_SERVICE_PORT}:5173" + - "${CHATQNA_FRONTEND_SERVICE_PORT:-5173}:5173" environment: - - no_proxy=${no_proxy} - - https_proxy=${https_proxy} - - http_proxy=${http_proxy} - - CHAT_BASE_URL=${CHATQNA_BACKEND_SERVICE_ENDPOINT} - - UPLOAD_FILE_BASE_URL=${CHATQNA_DATAPREP_SERVICE_ENDPOINT} - - GET_FILE=${CHATQNA_DATAPREP_GET_FILE_ENDPOINT} - - DELETE_FILE=${CHATQNA_DATAPREP_DELETE_FILE_ENDPOINT} + no_proxy: ${no_proxy} + https_proxy: ${https_proxy} + http_proxy: ${http_proxy} + CHAT_BASE_URL: ${CHATQNA_BACKEND_SERVICE_ENDPOINT} + UPLOAD_FILE_BASE_URL: ${CHATQNA_DATAPREP_SERVICE_ENDPOINT} + GET_FILE: ${CHATQNA_DATAPREP_GET_FILE_ENDPOINT} + DELETE_FILE: ${CHATQNA_DATAPREP_DELETE_FILE_ENDPOINT} ipc: host restart: always + chatqna-nginx-server: image: ${REGISTRY:-opea}/nginx:${TAG:-latest} container_name: chaqna-nginx-server @@ -169,14 +159,14 @@ services: ports: - "${CHATQNA_NGINX_PORT}:80" environment: - - no_proxy=${no_proxy} - - https_proxy=${https_proxy} - - http_proxy=${http_proxy} - - FRONTEND_SERVICE_IP=${CHATQNA_FRONTEND_SERVICE_IP} - - FRONTEND_SERVICE_PORT=${CHATQNA_FRONTEND_SERVICE_PORT} - - BACKEND_SERVICE_NAME=${CHATQNA_BACKEND_SERVICE_NAME} - - BACKEND_SERVICE_IP=${CHATQNA_BACKEND_SERVICE_IP} - - BACKEND_SERVICE_PORT=${CHATQNA_BACKEND_SERVICE_PORT} + no_proxy: ${no_proxy} + https_proxy: ${https_proxy} + http_proxy: ${http_proxy} + FRONTEND_SERVICE_IP: ${CHATQNA_FRONTEND_SERVICE_IP} + FRONTEND_SERVICE_PORT: ${CHATQNA_FRONTEND_SERVICE_PORT} + BACKEND_SERVICE_NAME: ${CHATQNA_BACKEND_SERVICE_NAME} + BACKEND_SERVICE_IP: ${CHATQNA_BACKEND_SERVICE_IP} + BACKEND_SERVICE_PORT: ${CHATQNA_BACKEND_SERVICE_PORT} ipc: host restart: always diff --git a/ChatQnA/docker_compose/amd/gpu/rocm/compose_faqgen.yaml b/ChatQnA/docker_compose/amd/gpu/rocm/compose_faqgen.yaml index bb1f545f79..ae726f1208 100644 --- a/ChatQnA/docker_compose/amd/gpu/rocm/compose_faqgen.yaml +++ b/ChatQnA/docker_compose/amd/gpu/rocm/compose_faqgen.yaml @@ -4,13 +4,14 @@ services: chatqna-redis-vector-db: image: redis/redis-stack:7.2.0-v9 - container_name: redis-vector-db + container_name: chatqna-redis-vector-db ports: - "${CHATQNA_REDIS_VECTOR_PORT}:6379" - "${CHATQNA_REDIS_VECTOR_INSIGHT_PORT}:8001" - chatqna-dataprep-redis-service: + + chatqna-dataprep-service: image: ${REGISTRY:-opea}/dataprep:${TAG:-latest} - container_name: dataprep-redis-server + container_name: chatqna-dataprep-service depends_on: - chatqna-redis-vector-db - chatqna-tei-embedding-service @@ -24,13 +25,14 @@ services: INDEX_NAME: ${CHATQNA_INDEX_NAME} TEI_ENDPOINT: ${CHATQNA_TEI_EMBEDDING_ENDPOINT} HUGGINGFACEHUB_API_TOKEN: ${CHATQNA_HUGGINGFACEHUB_API_TOKEN} + chatqna-tei-embedding-service: image: ghcr.io/huggingface/text-embeddings-inference:cpu-1.5 - container_name: chatqna-tei-embedding-server + container_name: chatqna-tei-embedding-service ports: - "${CHATQNA_TEI_EMBEDDING_PORT}:80" volumes: - - "${MODEL_CACHE:-/var/opea/chatqna-service/data}:/data" + - "${MODEL_CACHE:-./data}:/data" shm_size: 1g ipc: host environment: @@ -38,19 +40,10 @@ services: http_proxy: ${http_proxy} https_proxy: ${https_proxy} command: --model-id ${CHATQNA_EMBEDDING_MODEL_ID} --auto-truncate - devices: - - /dev/kfd:/dev/kfd - - /dev/dri/card1:/dev/dri/card1 - - /dev/dri/renderD136:/dev/dri/renderD136 - cap_add: - - SYS_PTRACE - group_add: - - video - security_opt: - - seccomp:unconfined + chatqna-retriever: image: ${REGISTRY:-opea}/retriever:${TAG:-latest} - container_name: chatqna-retriever-redis-server + container_name: chatqna-retriever depends_on: - chatqna-redis-vector-db ports: @@ -63,16 +56,18 @@ services: REDIS_URL: ${CHATQNA_REDIS_URL} INDEX_NAME: ${CHATQNA_INDEX_NAME} TEI_EMBEDDING_ENDPOINT: ${CHATQNA_TEI_EMBEDDING_ENDPOINT} + HUGGINGFACEHUB_API_TOKEN: ${CHATQNA_HUGGINGFACEHUB_API_TOKEN} LOGFLAG: ${LOGFLAG} RETRIEVER_COMPONENT_NAME: "OPEA_RETRIEVER_REDIS" restart: unless-stopped + chatqna-tei-reranking-service: image: ghcr.io/huggingface/text-embeddings-inference:cpu-1.5 - container_name: chatqna-tei-reranking-server + container_name: chatqna-tei-reranking-service ports: - "${CHATQNA_TEI_RERANKING_PORT}:80" volumes: - - "${MODEL_CACHE:-/var/opea/chatqna-service/data}:/data" + - "${MODEL_CACHE:-./data}:/data" shm_size: 1g environment: no_proxy: ${no_proxy} @@ -81,19 +76,11 @@ services: HF_API_TOKEN: ${CHATQNA_HUGGINGFACEHUB_API_TOKEN} HF_HUB_DISABLE_PROGRESS_BARS: 1 HF_HUB_ENABLE_HF_TRANSFER: 0 - devices: - - /dev/kfd:/dev/kfd - - /dev/dri/:/dev/dri/ - cap_add: - - SYS_PTRACE - group_add: - - video - security_opt: - - seccomp:unconfined command: --model-id ${CHATQNA_RERANK_MODEL_ID} --auto-truncate + chatqna-tgi-service: - image: ${CHATQNA_TGI_SERVICE_IMAGE} - container_name: chatqna-tgi-server + image: ghcr.io/huggingface/text-generation-inference:2.3.1-rocm + container_name: chatqna-tgi-service ports: - "${CHATQNA_TGI_SERVICE_PORT}:80" environment: @@ -104,11 +91,11 @@ services: HF_HUB_DISABLE_PROGRESS_BARS: 1 HF_HUB_ENABLE_HF_TRANSFER: 0 volumes: - - "${MODEL_CACHE:-/var/opea/chatqna-service/data}:/data" + - "${MODEL_CACHE:-./data}:/data" shm_size: 1g devices: - /dev/kfd:/dev/kfd - - /dev/dri/:/dev/dri/ + - /dev/dri:/dev/dri cap_add: - SYS_PTRACE group_add: @@ -117,9 +104,10 @@ services: - seccomp:unconfined command: --model-id ${CHATQNA_LLM_MODEL_ID} ipc: host + chatqna-llm-faqgen: image: ${REGISTRY:-opea}/llm-faqgen:${TAG:-latest} - container_name: llm-faqgen-server + container_name: chatqna-llm-faqgen depends_on: - chatqna-tgi-service ports: @@ -129,12 +117,13 @@ services: no_proxy: ${no_proxy} http_proxy: ${http_proxy} https_proxy: ${https_proxy} - LLM_ENDPOINT: ${LLM_ENDPOINT} - LLM_MODEL_ID: ${LLM_MODEL_ID} - HF_TOKEN: ${HF_TOKEN} + LLM_ENDPOINT: ${CHATQNA_LLM_ENDPOINT} + LLM_MODEL_ID: ${CHATQNA_LLM_MODEL_ID} + HF_TOKEN: ${CHATQNA_HUGGINGFACEHUB_API_TOKEN} FAQGen_COMPONENT_NAME: ${FAQGen_COMPONENT_NAME:-OpeaFaqGenTgi} LOGFLAG: ${LOGFLAG:-False} restart: unless-stopped + chatqna-backend-server: image: ${REGISTRY:-opea}/chatqna:${TAG:-latest} container_name: chatqna-backend-server @@ -148,21 +137,22 @@ services: ports: - "${CHATQNA_BACKEND_SERVICE_PORT}:8888" environment: - - no_proxy=${no_proxy} - - https_proxy=${https_proxy} - - http_proxy=${http_proxy} - - MEGA_SERVICE_HOST_IP=${CHATQNA_MEGA_SERVICE_HOST_IP} - - EMBEDDING_SERVER_HOST_IP=${HOST_IP} - - EMBEDDING_SERVER_PORT=${CHATQNA_TEI_EMBEDDING_PORT:-80} - - RETRIEVER_SERVICE_HOST_IP=${HOST_IP} - - RERANK_SERVER_HOST_IP=${HOST_IP} - - RERANK_SERVER_PORT=${CHATQNA_TEI_RERANKING_PORT:-80} - - LLM_SERVER_HOST_IP=${HOST_IP} - - LLM_SERVER_PORT=${CHATQNA_LLM_FAQGEN_PORT:-9000} - - LLM_MODEL=${CHATQNA_LLM_MODEL_ID} - - CHATQNA_TYPE=${CHATQNA_TYPE:-CHATQNA_FAQGEN} + no_proxy: ${no_proxy} + https_proxy: ${https_proxy} + http_proxy: ${http_proxy} + MEGA_SERVICE_HOST_IP: ${CHATQNA_MEGA_SERVICE_HOST_IP} + EMBEDDING_SERVER_HOST_IP: ${HOST_IP} + EMBEDDING_SERVER_PORT: ${CHATQNA_TEI_EMBEDDING_PORT:-80} + RETRIEVER_SERVICE_HOST_IP: ${HOST_IP} + RERANK_SERVER_HOST_IP: ${HOST_IP} + RERANK_SERVER_PORT: ${CHATQNA_TEI_RERANKING_PORT:-80} + LLM_SERVER_HOST_IP: ${HOST_IP} + LLM_SERVER_PORT: ${CHATQNA_LLM_FAQGEN_PORT:-9000} + LLM_MODEL: ${CHATQNA_LLM_MODEL_ID} + CHATQNA_TYPE: ${CHATQNA_TYPE:-CHATQNA_FAQGEN} ipc: host restart: always + chatqna-ui-server: image: ${REGISTRY:-opea}/chatqna-ui:${TAG:-latest} container_name: chatqna-ui-server @@ -171,15 +161,16 @@ services: ports: - "${CHATQNA_FRONTEND_SERVICE_PORT}:5173" environment: - - no_proxy=${no_proxy} - - https_proxy=${https_proxy} - - http_proxy=${http_proxy} - - CHAT_BASE_URL=${CHATQNA_BACKEND_SERVICE_ENDPOINT} - - UPLOAD_FILE_BASE_URL=${CHATQNA_DATAPREP_SERVICE_ENDPOINT} - - GET_FILE=${CHATQNA_DATAPREP_GET_FILE_ENDPOINT} - - DELETE_FILE=${CHATQNA_DATAPREP_DELETE_FILE_ENDPOINT} + no_proxy: ${no_proxy} + https_proxy: ${https_proxy} + http_proxy: ${http_proxy} + CHAT_BASE_URL: ${CHATQNA_BACKEND_SERVICE_ENDPOINT} + UPLOAD_FILE_BASE_URL: ${CHATQNA_DATAPREP_SERVICE_ENDPOINT} + GET_FILE: ${CHATQNA_DATAPREP_GET_FILE_ENDPOINT} + DELETE_FILE: ${CHATQNA_DATAPREP_DELETE_FILE_ENDPOINT} ipc: host restart: always + chatqna-nginx-server: image: ${REGISTRY:-opea}/nginx:${TAG:-latest} container_name: chaqna-nginx-server @@ -189,14 +180,14 @@ services: ports: - "${CHATQNA_NGINX_PORT}:80" environment: - - no_proxy=${no_proxy} - - https_proxy=${https_proxy} - - http_proxy=${http_proxy} - - FRONTEND_SERVICE_IP=${CHATQNA_FRONTEND_SERVICE_IP} - - FRONTEND_SERVICE_PORT=${CHATQNA_FRONTEND_SERVICE_PORT} - - BACKEND_SERVICE_NAME=${CHATQNA_BACKEND_SERVICE_NAME} - - BACKEND_SERVICE_IP=${CHATQNA_BACKEND_SERVICE_IP} - - BACKEND_SERVICE_PORT=${CHATQNA_BACKEND_SERVICE_PORT} + no_proxy: ${no_proxy} + https_proxy: ${https_proxy} + http_proxy: ${http_proxy} + FRONTEND_SERVICE_IP: ${CHATQNA_FRONTEND_SERVICE_IP} + FRONTEND_SERVICE_PORT: ${CHATQNA_FRONTEND_SERVICE_PORT} + BACKEND_SERVICE_NAME: ${CHATQNA_BACKEND_SERVICE_NAME} + BACKEND_SERVICE_IP: ${CHATQNA_BACKEND_SERVICE_IP} + BACKEND_SERVICE_PORT: ${CHATQNA_BACKEND_SERVICE_PORT} ipc: host restart: always diff --git a/ChatQnA/docker_compose/amd/gpu/rocm/compose_faqgen_vllm.yaml b/ChatQnA/docker_compose/amd/gpu/rocm/compose_faqgen_vllm.yaml new file mode 100644 index 0000000000..6d7d0cd023 --- /dev/null +++ b/ChatQnA/docker_compose/amd/gpu/rocm/compose_faqgen_vllm.yaml @@ -0,0 +1,201 @@ +# Copyright (C) 2025 Advanced Micro Devices, Inc. +# SPDX-License-Identifier: Apache-2.0 + +services: + chatqna-redis-vector-db: + image: redis/redis-stack:7.2.0-v9 + container_name: chatqna-redis-vector-db + ports: + - "${CHATQNA_REDIS_VECTOR_PORT}:6379" + - "${CHATQNA_REDIS_VECTOR_INSIGHT_PORT}:8001" + + chatqna-dataprep-redis-service: + image: ${REGISTRY:-opea}/dataprep:${TAG:-latest} + container_name: chatqna-dataprep-service + depends_on: + - chatqna-redis-vector-db + - chatqna-tei-embedding-service + ports: + - "${CHATQNA_REDIS_DATAPREP_PORT}:5000" + environment: + no_proxy: ${no_proxy} + http_proxy: ${http_proxy} + https_proxy: ${https_proxy} + REDIS_URL: ${CHATQNA_REDIS_URL} + INDEX_NAME: ${CHATQNA_INDEX_NAME} + TEI_ENDPOINT: ${CHATQNA_TEI_EMBEDDING_ENDPOINT} + HUGGINGFACEHUB_API_TOKEN: ${CHATQNA_HUGGINGFACEHUB_API_TOKEN} + + chatqna-tei-embedding-service: + image: ghcr.io/huggingface/text-embeddings-inference:cpu-1.5 + container_name: chatqna-tei-embedding-service + ports: + - "${CHATQNA_TEI_EMBEDDING_PORT}:80" + volumes: + - "${MODEL_CACHE:-./data}:/data" + shm_size: 1g + ipc: host + environment: + no_proxy: ${no_proxy} + http_proxy: ${http_proxy} + https_proxy: ${https_proxy} + command: --model-id ${CHATQNA_EMBEDDING_MODEL_ID} --auto-truncate + + chatqna-retriever: + image: ${REGISTRY:-opea}/retriever:${TAG:-latest} + container_name: chatqna-retriever + depends_on: + - chatqna-redis-vector-db + ports: + - "${CHATQNA_REDIS_RETRIEVER_PORT}:7000" + ipc: host + environment: + no_proxy: ${no_proxy} + http_proxy: ${http_proxy} + https_proxy: ${https_proxy} + REDIS_URL: ${CHATQNA_REDIS_URL} + INDEX_NAME: ${CHATQNA_INDEX_NAME} + TEI_EMBEDDING_ENDPOINT: ${CHATQNA_TEI_EMBEDDING_ENDPOINT} + HUGGINGFACEHUB_API_TOKEN: ${CHATQNA_HUGGINGFACEHUB_API_TOKEN} + LOGFLAG: ${LOGFLAG} + RETRIEVER_COMPONENT_NAME: "OPEA_RETRIEVER_REDIS" + restart: unless-stopped + + chatqna-tei-reranking-service: + image: ghcr.io/huggingface/text-embeddings-inference:cpu-1.5 + container_name: chatqna-tei-reranking-service + ports: + - "${CHATQNA_TEI_RERANKING_PORT}:80" + volumes: + - "${MODEL_CACHE:-./data}:/data" + shm_size: 1g + environment: + no_proxy: ${no_proxy} + http_proxy: ${http_proxy} + https_proxy: ${https_proxy} + HF_API_TOKEN: ${CHATQNA_HUGGINGFACEHUB_API_TOKEN} + HUGGINGFACEHUB_API_TOKEN: ${CHATQNA_HUGGINGFACEHUB_API_TOKEN} + HF_HUB_DISABLE_PROGRESS_BARS: 1 + HF_HUB_ENABLE_HF_TRANSFER: 0 + command: --model-id ${CHATQNA_RERANK_MODEL_ID} --auto-truncate + + chatqna-vllm-service: + image: ${REGISTRY:-opea}/vllm-rocm:${TAG:-latest} + container_name: chatqna-vllm-service + ports: + - "${CHATQNA_VLLM_SERVICE_PORT:-8081}:8011" + environment: + no_proxy: ${no_proxy} + http_proxy: ${http_proxy} + https_proxy: ${https_proxy} + HUGGINGFACEHUB_API_TOKEN: ${CHATQNA_HUGGINGFACEHUB_API_TOKEN} + HF_TOKEN: ${CHATQNA_HUGGINGFACEHUB_API_TOKEN} + HF_HUB_DISABLE_PROGRESS_BARS: 1 + HF_HUB_ENABLE_HF_TRANSFER: 0 + WILM_USE_TRITON_FLASH_ATTENTION: 0 + PYTORCH_JIT: 0 + volumes: + - "${MODEL_CACHE:-./data}:/data" + shm_size: 128G + devices: + - /dev/kfd:/dev/kfd + - /dev/dri:/dev/dri + cap_add: + - SYS_PTRACE + group_add: + - video + security_opt: + - seccomp:unconfined + - apparmor=unconfined + command: "--model ${CHATQNA_LLM_MODEL_ID} --swap-space 16 --disable-log-requests --dtype float16 --tensor-parallel-size 4 --host 0.0.0.0 --port 8011 --num-scheduler-steps 1 --distributed-executor-backend \"mp\"" + ipc: host + + chatqna-llm-faqgen: + image: ${REGISTRY:-opea}/llm-faqgen:${TAG:-latest} + container_name: chatqna-llm-faqgen + depends_on: + - chatqna-vllm-service + ports: + - ${CHATQNA_LLM_FAQGEN_PORT:-9000}:9000 + ipc: host + environment: + no_proxy: ${no_proxy} + http_proxy: ${http_proxy} + https_proxy: ${https_proxy} + LLM_ENDPOINT: ${LLM_ENDPOINT} + LLM_MODEL_ID: ${CHATQNA_LLM_MODEL_ID} + HUGGINGFACEHUB_API_TOKEN: ${CHATQNA_HUGGINGFACEHUB_API_TOKEN} + FAQGen_COMPONENT_NAME: ${FAQGen_COMPONENT_NAME:-OpeaFaqGenvLLM} + LOGFLAG: ${LOGFLAG:-False} + restart: unless-stopped + + chatqna-backend-server: + image: ${REGISTRY:-opea}/chatqna:${TAG:-latest} + container_name: chatqna-backend-server + depends_on: + - chatqna-redis-vector-db + - chatqna-tei-embedding-service + - chatqna-retriever + - chatqna-tei-reranking-service + - chatqna-vllm-service + - chatqna-llm-faqgen + ports: + - "${CHATQNA_BACKEND_SERVICE_PORT}:8888" + environment: + no_proxy: ${no_proxy} + https_proxy: ${https_proxy} + http_proxy: ${http_proxy} + MEGA_SERVICE_HOST_IP: ${CHATQNA_MEGA_SERVICE_HOST_IP} + EMBEDDING_SERVER_HOST_IP: ${HOST_IP} + EMBEDDING_SERVER_PORT: ${CHATQNA_TEI_EMBEDDING_PORT:-80} + RETRIEVER_SERVICE_HOST_IP: ${HOST_IP} + RERANK_SERVER_HOST_IP: ${HOST_IP} + RERANK_SERVER_PORT: ${CHATQNA_TEI_RERANKING_PORT:-80} + LLM_SERVER_HOST_IP: ${HOST_IP} + LLM_SERVER_PORT: ${CHATQNA_LLM_FAQGEN_PORT:-9000} + LLM_MODEL: ${CHATQNA_LLM_MODEL_ID} + CHATQNA_TYPE: ${CHATQNA_TYPE:-CHATQNA_FAQGEN} + ipc: host + restart: always + + chatqna-ui-server: + image: ${REGISTRY:-opea}/chatqna-ui:${TAG:-latest} + container_name: chatqna-ui-server + depends_on: + - chatqna-backend-server + ports: + - "${CHATQNA_FRONTEND_SERVICE_PORT}:5173" + environment: + no_proxy: ${no_proxy} + https_proxy: ${https_proxy} + http_proxy: ${http_proxy} + CHAT_BASE_URL: ${CHATQNA_BACKEND_SERVICE_ENDPOINT} + UPLOAD_FILE_BASE_URL: ${CHATQNA_DATAPREP_SERVICE_ENDPOINT} + GET_FILE: ${CHATQNA_DATAPREP_GET_FILE_ENDPOINT} + DELETE_FILE: ${CHATQNA_DATAPREP_DELETE_FILE_ENDPOINT} + ipc: host + restart: always + + chatqna-nginx-server: + image: ${REGISTRY:-opea}/nginx:${TAG:-latest} + container_name: chaqna-nginx-server + depends_on: + - chatqna-backend-server + - chatqna-ui-server + ports: + - "${CHATQNA_NGINX_PORT}:80" + environment: + no_proxy: ${no_proxy} + https_proxy: ${https_proxy} + http_proxy: ${http_proxy} + FRONTEND_SERVICE_IP: ${CHATQNA_FRONTEND_SERVICE_IP} + FRONTEND_SERVICE_PORT: ${CHATQNA_FRONTEND_SERVICE_PORT} + BACKEND_SERVICE_NAME: ${CHATQNA_BACKEND_SERVICE_NAME} + BACKEND_SERVICE_IP: ${CHATQNA_BACKEND_SERVICE_IP} + BACKEND_SERVICE_PORT: ${CHATQNA_BACKEND_SERVICE_PORT} + ipc: host + restart: always + +networks: + default: + driver: bridge diff --git a/ChatQnA/docker_compose/amd/gpu/rocm/compose_vllm.yaml b/ChatQnA/docker_compose/amd/gpu/rocm/compose_vllm.yaml new file mode 100644 index 0000000000..51cb00229f --- /dev/null +++ b/ChatQnA/docker_compose/amd/gpu/rocm/compose_vllm.yaml @@ -0,0 +1,177 @@ +# Copyright (C) 2024 Advanced Micro Devices, Inc. +# SPDX-License-Identifier: Apache-2.0 + +services: + chatqna-redis-vector-db: + image: redis/redis-stack:7.2.0-v9 + container_name: chatqna-redis-vector-db + ports: + - "${CHATQNA_REDIS_VECTOR_PORT:-6379}:6379" + - "${CHATQNA_REDIS_VECTOR_INSIGHT_PORT:-8001}:8001" + + chatqna-dataprep-service: + image: ${REGISTRY:-opea}/dataprep:${TAG:-latest} + container_name: chatqna-dataprep-service + depends_on: + - chatqna-redis-vector-db + - chatqna-tei-embedding-service + ports: + - "${CHATQNA_REDIS_DATAPREP_PORT:-5000}:5000" + environment: + no_proxy: ${no_proxy} + http_proxy: ${http_proxy} + https_proxy: ${https_proxy} + REDIS_URL: ${CHATQNA_REDIS_URL} + INDEX_NAME: ${CHATQNA_INDEX_NAME} + TEI_ENDPOINT: ${CHATQNA_TEI_EMBEDDING_ENDPOINT} + HUGGINGFACEHUB_API_TOKEN: ${CHATQNA_HUGGINGFACEHUB_API_TOKEN} + + chatqna-tei-embedding-service: + image: ghcr.io/huggingface/text-embeddings-inference:cpu-1.5 + container_name: chatqna-tei-embedding-service + ports: + - "${CHATQNA_TEI_EMBEDDING_PORT}:80" + volumes: + - "${MODEL_CACHE:-./data}:/data" + shm_size: 1g + ipc: host + environment: + no_proxy: ${no_proxy} + http_proxy: ${http_proxy} + https_proxy: ${https_proxy} + command: --model-id ${CHATQNA_EMBEDDING_MODEL_ID} --auto-truncate + + chatqna-retriever: + image: ${REGISTRY:-opea}/retriever:${TAG:-latest} + container_name: chatqna-retriever + depends_on: + - chatqna-redis-vector-db + ports: + - "${CHATQNA_REDIS_RETRIEVER_PORT:-7000}:7000" + ipc: host + environment: + no_proxy: ${no_proxy} + http_proxy: ${http_proxy} + https_proxy: ${https_proxy} + REDIS_URL: ${CHATQNA_REDIS_URL} + INDEX_NAME: ${CHATQNA_INDEX_NAME} + TEI_EMBEDDING_ENDPOINT: ${CHATQNA_TEI_EMBEDDING_ENDPOINT} + HUGGINGFACEHUB_API_TOKEN: ${CHATQNA_HUGGINGFACEHUB_API_TOKEN} + restart: unless-stopped + + chatqna-tei-reranking-service: + image: ghcr.io/huggingface/text-embeddings-inference:cpu-1.5 + container_name: chatqna-tei-reranking-service + ports: + - "${CHATQNA_TEI_RERANKING_PORT}:80" + volumes: + - "${MODEL_CACHE:-./data}:/data" + shm_size: 1g + environment: + no_proxy: ${no_proxy} + http_proxy: ${http_proxy} + https_proxy: ${https_proxy} + HUGGINGFACEHUB_API_TOKEN: ${CHATQNA_HUGGINGFACEHUB_API_TOKEN} + HF_HUB_DISABLE_PROGRESS_BARS: 1 + HF_HUB_ENABLE_HF_TRANSFER: 0 + command: --model-id ${CHATQNA_RERANK_MODEL_ID} --auto-truncate + + chatqna-vllm-service: + image: ${REGISTRY:-opea}/vllm-rocm:${TAG:-latest} + container_name: chatqna-vllm-service + ports: + - "${CHATQNA_VLLM_SERVICE_PORT:-8081}:8011" + environment: + no_proxy: ${no_proxy} + http_proxy: ${http_proxy} + https_proxy: ${https_proxy} + HUGGINGFACEHUB_API_TOKEN: ${CHATQNA_HUGGINGFACEHUB_API_TOKEN} + HF_TOKEN: ${CHATQNA_HUGGINGFACEHUB_API_TOKEN} + HF_HUB_DISABLE_PROGRESS_BARS: 1 + HF_HUB_ENABLE_HF_TRANSFER: 0 + WILM_USE_TRITON_FLASH_ATTENTION: 0 + PYTORCH_JIT: 0 + volumes: + - "${MODEL_CACHE:-./data}:/data" + shm_size: 128G + devices: + - /dev/kfd:/dev/kfd + - /dev/dri:/dev/dri + cap_add: + - SYS_PTRACE + group_add: + - video + security_opt: + - seccomp:unconfined + - apparmor=unconfined + command: "--model ${CHATQNA_LLM_MODEL_ID} --swap-space 16 --disable-log-requests --dtype float16 --tensor-parallel-size 4 --host 0.0.0.0 --port 8011 --num-scheduler-steps 1 --distributed-executor-backend \"mp\"" + ipc: host + + chatqna-backend-server: + image: ${REGISTRY:-opea}/chatqna:${TAG:-latest} + container_name: chatqna-backend-server + depends_on: + - chatqna-redis-vector-db + - chatqna-tei-embedding-service + - chatqna-retriever + - chatqna-tei-reranking-service + - chatqna-vllm-service + ports: + - "${CHATQNA_BACKEND_SERVICE_PORT}:8888" + environment: + no_proxy: ${no_proxy} + https_proxy: ${https_proxy} + http_proxy: ${http_proxy} + MEGA_SERVICE_HOST_IP: ${CHATQNA_MEGA_SERVICE_HOST_IP} + EMBEDDING_SERVER_HOST_IP: ${HOST_IP} + EMBEDDING_SERVER_PORT: ${CHATQNA_TEI_EMBEDDING_PORT:-80} + RETRIEVER_SERVICE_HOST_IP: ${HOST_IP} + RERANK_SERVER_HOST_IP: ${HOST_IP} + RERANK_SERVER_PORT: ${CHATQNA_TEI_RERANKING_PORT:-80} + LLM_SERVER_HOST_IP: ${HOST_IP} + LLM_SERVER_PORT: ${CHATQNA_VLLM_SERVICE_PORT:-80} + LLM_MODEL: ${CHATQNA_LLM_MODEL_ID} + ipc: host + restart: always + + chatqna-ui-server: + image: ${REGISTRY:-opea}/chatqna-ui:${TAG:-latest} + container_name: chatqna-ui-server + depends_on: + - chatqna-backend-server + ports: + - "${CHATQNA_FRONTEND_SERVICE_PORT}:5173" + environment: + no_proxy: ${no_proxy} + https_proxy: ${https_proxy} + http_proxy: ${http_proxy} + CHAT_BASE_URL: ${CHATQNA_BACKEND_SERVICE_ENDPOINT} + UPLOAD_FILE_BASE_URL: ${CHATQNA_DATAPREP_SERVICE_ENDPOINT} + GET_FILE: ${CHATQNA_DATAPREP_GET_FILE_ENDPOINT} + DELETE_FILE: ${CHATQNA_DATAPREP_DELETE_FILE_ENDPOINT} + ipc: host + restart: always + + chatqna-nginx-server: + image: ${REGISTRY:-opea}/nginx:${TAG:-latest} + container_name: chatqna-nginx-server + depends_on: + - chatqna-backend-server + - chatqna-ui-server + ports: + - "${CHATQNA_NGINX_PORT}:80" + environment: + no_proxy: ${no_proxy} + https_proxy: ${https_proxy} + http_proxy: ${http_proxy} + FRONTEND_SERVICE_IP: ${CHATQNA_FRONTEND_SERVICE_IP} + FRONTEND_SERVICE_PORT: ${CHATQNA_FRONTEND_SERVICE_PORT} + BACKEND_SERVICE_NAME: ${CHATQNA_BACKEND_SERVICE_NAME} + BACKEND_SERVICE_IP: ${CHATQNA_BACKEND_SERVICE_IP} + BACKEND_SERVICE_PORT: ${CHATQNA_BACKEND_SERVICE_PORT} + ipc: host + restart: always + +networks: + default: + driver: bridge diff --git a/ChatQnA/docker_compose/amd/gpu/rocm/set_env.sh b/ChatQnA/docker_compose/amd/gpu/rocm/set_env.sh index 21f2de92bb..5691d8fa48 100644 --- a/ChatQnA/docker_compose/amd/gpu/rocm/set_env.sh +++ b/ChatQnA/docker_compose/amd/gpu/rocm/set_env.sh @@ -1,35 +1,39 @@ #!/usr/bin/env bash -# Copyright (C) 2024 Advanced Micro Devices, Inc. -# SPDX-License-Identifier: Apache-2.0 +# Copyright (C) 2025 Advanced Micro Devices, Inc. + +export HOST_IP='' +export HOST_IP_EXTERNAL='' -export CHATQNA_TGI_SERVICE_IMAGE="ghcr.io/huggingface/text-generation-inference:2.3.1-rocm" export CHATQNA_EMBEDDING_MODEL_ID="BAAI/bge-base-en-v1.5" -export CHATQNA_RERANK_MODEL_ID="BAAI/bge-reranker-base" +export CHATQNA_HUGGINGFACEHUB_API_TOKEN=${HUGGINGFACEHUB_API_TOKEN} export CHATQNA_LLM_MODEL_ID="meta-llama/Meta-Llama-3-8B-Instruct" -export CHATQNA_TGI_SERVICE_PORT=18008 +export CHATQNA_RERANK_MODEL_ID="BAAI/bge-reranker-base" + +export CHATQNA_BACKEND_SERVICE_PORT=18102 +export CHATQNA_FRONTEND_SERVICE_PORT=18101 +export CHATQNA_NGINX_PORT=18104 +export CHATQNA_REDIS_DATAPREP_PORT=18103 +export CHATQNA_REDIS_RETRIEVER_PORT=7000 +export CHATQNA_REDIS_VECTOR_INSIGHT_PORT=8001 +export CHATQNA_REDIS_VECTOR_PORT=6379 export CHATQNA_TEI_EMBEDDING_PORT=18090 -export CHATQNA_TEI_EMBEDDING_ENDPOINT="http://${HOST_IP}:${CHATQNA_TEI_EMBEDDING_PORT}" export CHATQNA_TEI_RERANKING_PORT=18808 -export CHATQNA_REDIS_VECTOR_PORT=16379 -export CHATQNA_REDIS_VECTOR_INSIGHT_PORT=8001 -export CHATQNA_REDIS_DATAPREP_PORT=6007 -export CHATQNA_REDIS_RETRIEVER_PORT=7000 -export CHATQNA_LLM_FAQGEN_PORT=18010 -export CHATQNA_INDEX_NAME="rag-redis" -export CHATQNA_MEGA_SERVICE_HOST_IP=${HOST_IP} -export CHATQNA_RETRIEVER_SERVICE_HOST_IP=${HOST_IP} -export CHATQNA_BACKEND_SERVICE_ENDPOINT="http://127.0.0.1:${CHATQNA_BACKEND_SERVICE_PORT}/v1/chatqna" -export CHATQNA_DATAPREP_SERVICE_ENDPOINT="http://127.0.0.1:${CHATQNA_REDIS_DATAPREP_PORT}/v1/dataprep/ingest" -export CHATQNA_DATAPREP_GET_FILE_ENDPOINT="http://127.0.0.1:${CHATQNA_REDIS_DATAPREP_PORT}/v1/dataprep/get" -export CHATQNA_DATAPREP_DELETE_FILE_ENDPOINT="http://127.0.0.1:${CHATQNA_REDIS_DATAPREP_PORT}/v1/dataprep/delete" -export CHATQNA_FRONTEND_SERVICE_IP=${HOST_IP} -export CHATQNA_FRONTEND_SERVICE_PORT=15173 -export CHATQNA_BACKEND_SERVICE_NAME=chatqna +export CHATQNA_TGI_SERVICE_PORT=18008 + +export CHATQNA_BACKEND_SERVICE_ENDPOINT="http://${HOST_IP_EXTERNAL}:${CHATQNA_BACKEND_SERVICE_PORT}/v1/chatqna" export CHATQNA_BACKEND_SERVICE_IP=${HOST_IP} -export CHATQNA_BACKEND_SERVICE_PORT=18888 -export CHATQNA_REDIS_URL="redis://${HOST_IP}:${CHATQNA_REDIS_VECTOR_PORT}" +export CHATQNA_DATAPREP_DELETE_FILE_ENDPOINT="http://${HOST_IP_EXTERNAL}:${CHATQNA_REDIS_DATAPREP_PORT}/v1/dataprep/delete" +export CHATQNA_DATAPREP_GET_FILE_ENDPOINT="http://${HOST_IP_EXTERNAL}:${CHATQNA_REDIS_DATAPREP_PORT}/v1/dataprep/get" +export CHATQNA_DATAPREP_SERVICE_ENDPOINT="http://${HOST_IP_EXTERNAL}:${CHATQNA_REDIS_DATAPREP_PORT}/v1/dataprep/ingest" export CHATQNA_EMBEDDING_SERVICE_HOST_IP=${HOST_IP} -export CHATQNA_RERANK_SERVICE_HOST_IP=${HOST_IP} +export CHATQNA_FRONTEND_SERVICE_IP=${HOST_IP} export CHATQNA_LLM_SERVICE_HOST_IP=${HOST_IP} -export CHATQNA_NGINX_PORT=15176 +export CHATQNA_MEGA_SERVICE_HOST_IP=${HOST_IP} +export CHATQNA_REDIS_URL="redis://${HOST_IP}:${CHATQNA_REDIS_VECTOR_PORT}" +export CHATQNA_RERANK_SERVICE_HOST_IP=${HOST_IP} +export CHATQNA_RETRIEVER_SERVICE_HOST_IP=${HOST_IP} +export CHATQNA_TEI_EMBEDDING_ENDPOINT="http://${HOST_IP}:${CHATQNA_TEI_EMBEDDING_PORT}" + +export CHATQNA_BACKEND_SERVICE_NAME=chatqna +export CHATQNA_INDEX_NAME="rag-redis" diff --git a/ChatQnA/docker_compose/amd/gpu/rocm/set_env_faqgen.sh b/ChatQnA/docker_compose/amd/gpu/rocm/set_env_faqgen.sh new file mode 100644 index 0000000000..6361f5a9fd --- /dev/null +++ b/ChatQnA/docker_compose/amd/gpu/rocm/set_env_faqgen.sh @@ -0,0 +1,42 @@ +#!/usr/bin/env bash + +# Copyright (C) 2025 Advanced Micro Devices, Inc. + +export HOST_IP='' +export HOST_IP_EXTERNAL='' + +export CHATQNA_EMBEDDING_MODEL_ID="BAAI/bge-base-en-v1.5" +export CHATQNA_HUGGINGFACEHUB_API_TOKEN=${HUGGINGFACEHUB_API_TOKEN} +export CHATQNA_LLM_MODEL_ID="meta-llama/Meta-Llama-3-8B-Instruct" +export CHATQNA_RERANK_MODEL_ID="BAAI/bge-reranker-base" + +export CHATQNA_BACKEND_SERVICE_PORT=18102 +export CHATQNA_FRONTEND_SERVICE_PORT=18101 +export CHATQNA_LLM_FAQGEN_PORT=18011 +export CHATQNA_NGINX_PORT=18104 +export CHATQNA_REDIS_DATAPREP_PORT=18103 +export CHATQNA_REDIS_RETRIEVER_PORT=7000 +export CHATQNA_REDIS_VECTOR_INSIGHT_PORT=8001 +export CHATQNA_REDIS_VECTOR_PORT=6379 +export CHATQNA_TEI_EMBEDDING_PORT=18090 +export CHATQNA_TEI_RERANKING_PORT=18808 +export CHATQNA_TGI_SERVICE_PORT=18008 + +export CHATQNA_BACKEND_SERVICE_ENDPOINT="http://${HOST_IP_EXTERNAL}:${CHATQNA_BACKEND_SERVICE_PORT}/v1/chatqna" +export CHATQNA_BACKEND_SERVICE_IP=${HOST_IP} +export CHATQNA_DATAPREP_DELETE_FILE_ENDPOINT="http://${HOST_IP_EXTERNAL}:${CHATQNA_REDIS_DATAPREP_PORT}/v1/dataprep/delete" +export CHATQNA_DATAPREP_GET_FILE_ENDPOINT="http://${HOST_IP_EXTERNAL}:${CHATQNA_REDIS_DATAPREP_PORT}/v1/dataprep/get" +export CHATQNA_DATAPREP_SERVICE_ENDPOINT="http://${HOST_IP_EXTERNAL}:${CHATQNA_REDIS_DATAPREP_PORT}/v1/dataprep/ingest" +export CHATQNA_EMBEDDING_SERVICE_HOST_IP=${HOST_IP} +export CHATQNA_FRONTEND_SERVICE_IP=${HOST_IP} +export CHATQNA_LLM_SERVICE_HOST_IP=${HOST_IP} +export CHATQNA_LLM_ENDPOINT="http://${HOST_IP}:${CHATQNA_TGI_SERVICE_PORT}" +export CHATQNA_MEGA_SERVICE_HOST_IP=${HOST_IP} +export CHATQNA_REDIS_URL="redis://${HOST_IP}:${CHATQNA_REDIS_VECTOR_PORT}" +export CHATQNA_RERANK_SERVICE_HOST_IP=${HOST_IP} +export CHATQNA_RETRIEVER_SERVICE_HOST_IP=${HOST_IP} +export CHATQNA_TEI_EMBEDDING_ENDPOINT="http://${HOST_IP}:${CHATQNA_TEI_EMBEDDING_PORT}" + +export CHATQNA_BACKEND_SERVICE_NAME=chatqna +export CHATQNA_INDEX_NAME="rag-redis" +export FAQGen_COMPONENT_NAME="OpeaFaqGenTgi" diff --git a/ChatQnA/docker_compose/amd/gpu/rocm/set_env_faqgen_vllm.sh b/ChatQnA/docker_compose/amd/gpu/rocm/set_env_faqgen_vllm.sh new file mode 100644 index 0000000000..20dd880b2d --- /dev/null +++ b/ChatQnA/docker_compose/amd/gpu/rocm/set_env_faqgen_vllm.sh @@ -0,0 +1,39 @@ +#!/usr/bin/env bash + +# Copyright (C) 2025 Advanced Micro Devices, Inc. + +export HOST_IP='' +export HOST_IP_EXTERNAL='' + +export CHATQNA_EMBEDDING_MODEL_ID="BAAI/bge-base-en-v1.5" +export CHATQNA_HUGGINGFACEHUB_API_TOKEN=${HUGGINGFACEHUB_API_TOKEN} +export CHATQNA_LLM_MODEL_ID="meta-llama/Meta-Llama-3-8B-Instruct" +export CHATQNA_RERANK_MODEL_ID="BAAI/bge-reranker-base" + +export CHATQNA_BACKEND_SERVICE_PORT=18102 +export CHATQNA_FRONTEND_SERVICE_PORT=18101 +export CHATQNA_LLM_FAQGEN_PORT=18011 +export CHATQNA_NGINX_PORT=18104 +export CHATQNA_REDIS_DATAPREP_PORT=18103 +export CHATQNA_REDIS_RETRIEVER_PORT=7000 +export CHATQNA_REDIS_VECTOR_INSIGHT_PORT=8001 +export CHATQNA_REDIS_VECTOR_PORT=6379 +export CHATQNA_TEI_EMBEDDING_PORT=18090 +export CHATQNA_TEI_RERANKING_PORT=18808 +export CHATQNA_VLLM_SERVICE_PORT=18008 + +export CHATQNA_BACKEND_SERVICE_ENDPOINT="http://${HOST_IP_EXTERNAL}:${CHATQNA_BACKEND_SERVICE_PORT}/v1/chatqna" +export CHATQNA_BACKEND_SERVICE_IP=${HOST_IP_EXTERNAL} +export CHATQNA_DATAPREP_DELETE_FILE_ENDPOINT="http://${HOST_IP_EXTERNAL}:${CHATQNA_REDIS_DATAPREP_PORT}/v1/dataprep/delete" +export CHATQNA_DATAPREP_GET_FILE_ENDPOINT="http://${HOST_IP_EXTERNAL}:${CHATQNA_REDIS_DATAPREP_PORT}/v1/dataprep/get" +export CHATQNA_DATAPREP_SERVICE_ENDPOINT="http://${HOST_IP_EXTERNAL}:${CHATQNA_REDIS_DATAPREP_PORT}/v1/dataprep/ingest" +export CHATQNA_FRONTEND_SERVICE_IP=${HOST_IP} +export CHATQNA_MEGA_SERVICE_HOST_IP=${HOST_IP} +export CHATQNA_REDIS_URL="redis://${HOST_IP}:${CHATQNA_REDIS_VECTOR_PORT}" +export CHATQNA_TEI_EMBEDDING_ENDPOINT="http://${HOST_IP}:${CHATQNA_TEI_EMBEDDING_PORT}" +export LLM_ENDPOINT="http://${HOST_IP}:${CHATQNA_VLLM_SERVICE_PORT}" + +export CHATQNA_BACKEND_SERVICE_NAME=chatqna +export CHATQNA_INDEX_NAME="rag-redis" +export CHATQNA_TYPE="CHATQNA_FAQGEN" +export FAQGen_COMPONENT_NAME="OpeaFaqGenvLLM" diff --git a/ChatQnA/docker_compose/amd/gpu/rocm/set_env_vllm.sh b/ChatQnA/docker_compose/amd/gpu/rocm/set_env_vllm.sh new file mode 100644 index 0000000000..2d1c3920fd --- /dev/null +++ b/ChatQnA/docker_compose/amd/gpu/rocm/set_env_vllm.sh @@ -0,0 +1,39 @@ +#!/usr/bin/env bash + +# Copyright (C) 2025 Advanced Micro Devices, Inc. + +export HOST_IP='' +export HOST_IP_EXTERNAL='' + +export CHATQNA_EMBEDDING_MODEL_ID="BAAI/bge-base-en-v1.5" +export CHATQNA_HUGGINGFACEHUB_API_TOKEN=${HUGGINGFACEHUB_API_TOKEN} +export CHATQNA_LLM_MODEL_ID="meta-llama/Meta-Llama-3-8B-Instruct" +export CHATQNA_RERANK_MODEL_ID="BAAI/bge-reranker-base" + +export CHATQNA_BACKEND_SERVICE_PORT=18102 +export CHATQNA_FRONTEND_SERVICE_PORT=18101 +export CHATQNA_NGINX_PORT=18104 +export CHATQNA_REDIS_DATAPREP_PORT=18103 +export CHATQNA_REDIS_RETRIEVER_PORT=7000 +export CHATQNA_REDIS_VECTOR_INSIGHT_PORT=8001 +export CHATQNA_REDIS_VECTOR_PORT=6379 +export CHATQNA_TEI_EMBEDDING_PORT=18090 +export CHATQNA_TEI_RERANKING_PORT=18808 +export CHATQNA_VLLM_SERVICE_PORT=18008 + +export CHATQNA_BACKEND_SERVICE_ENDPOINT="http://${HOST_IP_EXTERNAL}:${CHATQNA_BACKEND_SERVICE_PORT}/v1/chatqna" +export CHATQNA_BACKEND_SERVICE_IP=${HOST_IP_EXTERNAL} +export CHATQNA_DATAPREP_DELETE_FILE_ENDPOINT="http://${HOST_IP_EXTERNAL}:${CHATQNA_REDIS_DATAPREP_PORT}/v1/dataprep/delete" +export CHATQNA_DATAPREP_GET_FILE_ENDPOINT="http://${HOST_IP_EXTERNAL}:${CHATQNA_REDIS_DATAPREP_PORT}/v1/dataprep/get" +export CHATQNA_DATAPREP_SERVICE_ENDPOINT="http://${HOST_IP_EXTERNAL}:${CHATQNA_REDIS_DATAPREP_PORT}/v1/dataprep/ingest" +export CHATQNA_EMBEDDING_SERVICE_HOST_IP=${HOST_IP} +export CHATQNA_FRONTEND_SERVICE_IP=${HOST_IP} +export CHATQNA_LLM_SERVICE_HOST_IP=${HOST_IP} +export CHATQNA_MEGA_SERVICE_HOST_IP=${HOST_IP} +export CHATQNA_REDIS_URL="redis://${HOST_IP}:${CHATQNA_REDIS_VECTOR_PORT}" +export CHATQNA_RERANK_SERVICE_HOST_IP=${HOST_IP} +export CHATQNA_RETRIEVER_SERVICE_HOST_IP=${HOST_IP} +export CHATQNA_TEI_EMBEDDING_ENDPOINT="http://${HOST_IP}:${CHATQNA_TEI_EMBEDDING_PORT}" + +export CHATQNA_BACKEND_SERVICE_NAME=chatqna +export CHATQNA_INDEX_NAME="rag-redis" diff --git a/ChatQnA/docker_image_build/build.yaml b/ChatQnA/docker_image_build/build.yaml index 78d863db17..35dccaa9b4 100644 --- a/ChatQnA/docker_image_build/build.yaml +++ b/ChatQnA/docker_image_build/build.yaml @@ -65,6 +65,11 @@ services: dockerfile: comps/guardrails/src/guardrails/Dockerfile extends: chatqna image: ${REGISTRY:-opea}/guardrails:${TAG:-latest} + vllm-rocm: + build: + context: GenAIComps + dockerfile: comps/third_parties/vllm/src/Dockerfile.amd_gpu + image: ${REGISTRY:-opea}/vllm-rocm:${TAG:-latest} vllm: build: context: vllm diff --git a/ChatQnA/tests/test_compose_faqgen_on_rocm.sh b/ChatQnA/tests/test_compose_faqgen_on_rocm.sh index cdfc79c5e7..b0d3559629 100644 --- a/ChatQnA/tests/test_compose_faqgen_on_rocm.sh +++ b/ChatQnA/tests/test_compose_faqgen_on_rocm.sh @@ -9,44 +9,51 @@ echo "REGISTRY=IMAGE_REPO=${IMAGE_REPO}" echo "TAG=IMAGE_TAG=${IMAGE_TAG}" export REGISTRY=${IMAGE_REPO} export TAG=${IMAGE_TAG} -export MODEL_CACHE=${model_cache:-"/var/opea/chatqna-service/data"} +export MODEL_CACHE=${model_cache:-"./data"} WORKPATH=$(dirname "$PWD") LOG_PATH="$WORKPATH/tests" ip_address=$(hostname -I | awk '{print $1}') export HOST_IP=${ip_address} -export CHATQNA_TGI_SERVICE_IMAGE="ghcr.io/huggingface/text-generation-inference:2.3.1-rocm" +export HOST_IP_EXTERNAL=${ip_address} + export CHATQNA_EMBEDDING_MODEL_ID="BAAI/bge-base-en-v1.5" -export CHATQNA_RERANK_MODEL_ID="BAAI/bge-reranker-base" +export CHATQNA_HUGGINGFACEHUB_API_TOKEN=${HUGGINGFACEHUB_API_TOKEN} export CHATQNA_LLM_MODEL_ID="meta-llama/Meta-Llama-3-8B-Instruct" -export CHATQNA_TGI_SERVICE_PORT=9009 -export CHATQNA_TEI_EMBEDDING_PORT=8090 -export CHATQNA_TEI_EMBEDDING_ENDPOINT="http://${HOST_IP}:${CHATQNA_TEI_EMBEDDING_PORT}" -export CHATQNA_TEI_RERANKING_PORT=8808 -export CHATQNA_REDIS_VECTOR_PORT=6379 -export CHATQNA_REDIS_VECTOR_INSIGHT_PORT=8001 -export CHATQNA_REDIS_DATAPREP_PORT=6007 +export CHATQNA_RERANK_MODEL_ID="BAAI/bge-reranker-base" + +export CHATQNA_BACKEND_SERVICE_PORT=8888 +export CHATQNA_FRONTEND_SERVICE_PORT=5173 +export CHATQNA_LLM_FAQGEN_PORT=18011 +export CHATQNA_NGINX_PORT=80 +export CHATQNA_REDIS_DATAPREP_PORT=18103 export CHATQNA_REDIS_RETRIEVER_PORT=7000 -export CHATQNA_LLM_FAQGEN_PORT=18010 -export CHATQNA_INDEX_NAME="rag-redis" -export CHATQNA_MEGA_SERVICE_HOST_IP=${HOST_IP} -export CHATQNA_RETRIEVER_SERVICE_HOST_IP=${HOST_IP} -export CHATQNA_BACKEND_SERVICE_ENDPOINT="http://127.0.0.1:${CHATQNA_BACKEND_SERVICE_PORT}/v1/chatqna" -export CHATQNA_DATAPREP_SERVICE_ENDPOINT="http://127.0.0.1:${CHATQNA_REDIS_DATAPREP_PORT}/v1/dataprep/ingest" -export CHATQNA_DATAPREP_GET_FILE_ENDPOINT="http://127.0.0.1:${CHATQNA_REDIS_DATAPREP_PORT}/v1/dataprep/get" -export CHATQNA_DATAPREP_DELETE_FILE_ENDPOINT="http://127.0.0.1:${CHATQNA_REDIS_DATAPREP_PORT}/v1/dataprep/delete" -export CHATQNA_FRONTEND_SERVICE_IP=${HOST_IP} -export CHATQNA_FRONTEND_SERVICE_PORT=15173 -export CHATQNA_BACKEND_SERVICE_NAME=chatqna +export CHATQNA_REDIS_VECTOR_INSIGHT_PORT=8001 +export CHATQNA_REDIS_VECTOR_PORT=6379 +export CHATQNA_TEI_EMBEDDING_PORT=18090 +export CHATQNA_TEI_RERANKING_PORT=18808 +export CHATQNA_TGI_SERVICE_PORT=18008 + +export CHATQNA_BACKEND_SERVICE_ENDPOINT="http://${HOST_IP_EXTERNAL}:${CHATQNA_BACKEND_SERVICE_PORT}/v1/chatqna" export CHATQNA_BACKEND_SERVICE_IP=${HOST_IP} -export CHATQNA_BACKEND_SERVICE_PORT=8888 -export CHATQNA_REDIS_URL="redis://${HOST_IP}:${CHATQNA_REDIS_VECTOR_PORT}" +export CHATQNA_DATAPREP_DELETE_FILE_ENDPOINT="http://${HOST_IP_EXTERNAL}:${CHATQNA_REDIS_DATAPREP_PORT}/v1/dataprep/delete" +export CHATQNA_DATAPREP_GET_FILE_ENDPOINT="http://${HOST_IP_EXTERNAL}:${CHATQNA_REDIS_DATAPREP_PORT}/v1/dataprep/get" +export CHATQNA_DATAPREP_SERVICE_ENDPOINT="http://${HOST_IP_EXTERNAL}:${CHATQNA_REDIS_DATAPREP_PORT}/v1/dataprep/ingest" export CHATQNA_EMBEDDING_SERVICE_HOST_IP=${HOST_IP} -export CHATQNA_RERANK_SERVICE_HOST_IP=${HOST_IP} +export CHATQNA_FRONTEND_SERVICE_IP=${HOST_IP} export CHATQNA_LLM_SERVICE_HOST_IP=${HOST_IP} -export CHATQNA_NGINX_PORT=80 -export CHATQNA_HUGGINGFACEHUB_API_TOKEN=${HUGGINGFACEHUB_API_TOKEN} +export CHATQNA_LLM_ENDPOINT="http://${HOST_IP}:${CHATQNA_TGI_SERVICE_PORT}" +export CHATQNA_MEGA_SERVICE_HOST_IP=${HOST_IP} +export CHATQNA_REDIS_URL="redis://${HOST_IP}:${CHATQNA_REDIS_VECTOR_PORT}" +export CHATQNA_RERANK_SERVICE_HOST_IP=${HOST_IP} +export CHATQNA_RETRIEVER_SERVICE_HOST_IP=${HOST_IP} +export CHATQNA_TEI_EMBEDDING_ENDPOINT="http://${HOST_IP}:${CHATQNA_TEI_EMBEDDING_PORT}" + +export CHATQNA_BACKEND_SERVICE_NAME=chatqna +export CHATQNA_INDEX_NAME="rag-redis" +export FAQGen_COMPONENT_NAME="OpeaFaqGenTgi" + export PATH="~/miniconda3/bin:$PATH" function build_docker_images() { @@ -83,7 +90,7 @@ function start_services() { n=0 until [[ "$n" -ge 160 ]]; do - docker logs chatqna-tgi-server > "${LOG_PATH}"/tgi_service_start.log + docker logs chatqna-tgi-service > "${LOG_PATH}"/tgi_service_start.log if grep -q Connected "${LOG_PATH}"/tgi_service_start.log; then break fi @@ -141,66 +148,37 @@ function validate_microservices() { # tei for embedding service validate_service \ - "${ip_address}:8090/embed" \ + "${ip_address}:${CHATQNA_TEI_EMBEDDING_PORT}/embed" \ "[[" \ "tei-embedding" \ - "chatqna-tei-embedding-server" \ + "chatqna-tei-embedding-service" \ '{"inputs":"What is Deep Learning?"}' sleep 1m # retrieval can't curl as expected, try to wait for more time - # test /v1/dataprep/ingest upload file - echo "Deep learning is a subset of machine learning that utilizes neural networks with multiple layers to analyze various levels of abstract data representations. It enables computers to identify patterns and make decisions with minimal human intervention by learning from large amounts of data." > "$LOG_PATH"/dataprep_file.txt - validate_service \ - "http://${ip_address}:6007/v1/dataprep/ingest" \ - "Data preparation succeeded" \ - "dataprep_upload_file" \ - "dataprep-redis-server" - - # test /v1/dataprep/ingest upload link - validate_service \ - "http://${ip_address}:6007/v1/dataprep/ingest" \ - "Data preparation succeeded" \ - "dataprep_upload_link" \ - "dataprep-redis-server" - - # test /v1/dataprep/get - validate_service \ - "http://${ip_address}:6007/v1/dataprep/get" \ - '{"name":' \ - "dataprep_get" \ - "dataprep-redis-server" - - # test /v1/dataprep/delete - validate_service \ - "http://${ip_address}:6007/v1/dataprep/delete" \ - '{"status":true}' \ - "dataprep_del" \ - "dataprep-redis-server" - # retrieval microservice test_embedding=$(python3 -c "import random; embedding = [random.uniform(-1, 1) for _ in range(768)]; print(embedding)") validate_service \ - "${ip_address}:7000/v1/retrieval" \ - "retrieved_docs" \ + "${ip_address}:${CHATQNA_REDIS_RETRIEVER_PORT}/v1/retrieval" \ + " " \ "retrieval-microservice" \ - "chatqna-retriever-redis-server" \ + "chatqna-retriever" \ "{\"text\":\"What is the revenue of Nike in 2023?\",\"embedding\":${test_embedding}}" # tei for rerank microservice validate_service \ - "${ip_address}:8808/rerank" \ + "${ip_address}:${CHATQNA_TEI_RERANKING_PORT}/rerank" \ '{"index":1,"score":' \ "tei-rerank" \ - "chatqna-tei-reranking-server" \ + "chatqna-tei-reranking-service" \ '{"query":"What is Deep Learning?", "texts": ["Deep Learning is not...", "Deep learning is..."]}' # tgi for llm service validate_service \ - "${ip_address}:9009/generate" \ + "${ip_address}:${CHATQNA_TGI_SERVICE_PORT}/generate" \ "generated_text" \ "tgi-llm" \ - "chatqna-tgi-server" \ + "chatqna-tgi-service" \ '{"inputs":"What is Deep Learning?","parameters":{"max_new_tokens":17, "do_sample": true}}' # faqgen llm microservice @@ -209,7 +187,7 @@ function validate_microservices() { "${ip_address}:${CHATQNA_LLM_FAQGEN_PORT}/v1/faqgen" \ "text" \ "llm" \ - "llm-faqgen-server" \ + "chatqna-llm-faqgen" \ '{"messages":"Text Embeddings Inference (TEI) is a toolkit for deploying and serving open source text embeddings and sequence classification models. TEI enables high-performance extraction for the most popular models, including FlagEmbedding, Ember, GTE and E5."}' } @@ -217,14 +195,14 @@ function validate_microservices() { function validate_megaservice() { # Curl the Mega Service validate_service \ - "${ip_address}:8888/v1/chatqna" \ + "${ip_address}:${CHATQNA_BACKEND_SERVICE_PORT}/v1/chatqna" \ "Embed" \ "chatqna-megaservice" \ "chatqna-backend-server" \ '{"messages": "Text Embeddings Inference (TEI) is a toolkit for deploying and serving open source text embeddings and sequence classification models. TEI enables high-performance extraction for the most popular models, including FlagEmbedding, Ember, GTE and E5.","max_tokens":32}' validate_service \ - "${ip_address}:8888/v1/chatqna" \ + "${ip_address}:${CHATQNA_BACKEND_SERVICE_PORT}/v1/chatqna" \ "Embed" \ "chatqna-megaservice" \ "chatqna-backend-server" \ @@ -236,7 +214,7 @@ function validate_frontend() { echo "[ TEST INFO ]: --------- frontend test started ---------" cd "$WORKPATH"/ui/svelte local conda_env_name="OPEA_e2e" - export PATH=${HOME}/miniforge3/bin/:$PATH + export PATH=${HOME}/miniconda3/bin/:$PATH if conda info --envs | grep -q "$conda_env_name"; then echo "$conda_env_name exist!" else diff --git a/ChatQnA/tests/test_compose_faqgen_vllm_on_rocm.sh b/ChatQnA/tests/test_compose_faqgen_vllm_on_rocm.sh new file mode 100644 index 0000000000..335c204e0a --- /dev/null +++ b/ChatQnA/tests/test_compose_faqgen_vllm_on_rocm.sh @@ -0,0 +1,238 @@ +#!/bin/bash +# Copyright (C) 2024 Intel Corporation +# SPDX-License-Identifier: Apache-2.0 + +set -e +IMAGE_REPO=${IMAGE_REPO:-"opea"} +IMAGE_TAG=${IMAGE_TAG:-"latest"} +echo "REGISTRY=IMAGE_REPO=${IMAGE_REPO}" +echo "TAG=IMAGE_TAG=${IMAGE_TAG}" +export REGISTRY=${IMAGE_REPO} +export TAG=${IMAGE_TAG} + +WORKPATH=$(dirname "$PWD") +LOG_PATH="$WORKPATH/tests" +ip_address=$(hostname -I | awk '{print $1}') + +export HOST_IP=${ip_address} +export HOST_IP_EXTERNAL=${ip_address} + +export CHATQNA_EMBEDDING_MODEL_ID="BAAI/bge-base-en-v1.5" +export CHATQNA_HUGGINGFACEHUB_API_TOKEN=${HUGGINGFACEHUB_API_TOKEN} +export CHATQNA_LLM_MODEL_ID="meta-llama/Meta-Llama-3-8B-Instruct" +export CHATQNA_RERANK_MODEL_ID="BAAI/bge-reranker-base" + +export CHATQNA_BACKEND_SERVICE_PORT=8888 +export CHATQNA_FRONTEND_SERVICE_PORT=5173 +export CHATQNA_LLM_FAQGEN_PORT=18011 +export CHATQNA_NGINX_PORT=80 +export CHATQNA_REDIS_DATAPREP_PORT=18103 +export CHATQNA_REDIS_RETRIEVER_PORT=7000 +export CHATQNA_REDIS_VECTOR_INSIGHT_PORT=8001 +export CHATQNA_REDIS_VECTOR_PORT=6379 +export CHATQNA_TEI_EMBEDDING_PORT=18090 +export CHATQNA_TEI_RERANKING_PORT=18808 +export CHATQNA_VLLM_SERVICE_PORT=18008 + +export CHATQNA_BACKEND_SERVICE_ENDPOINT="http://${HOST_IP_EXTERNAL}:${CHATQNA_BACKEND_SERVICE_PORT}/v1/chatqna" +export CHATQNA_BACKEND_SERVICE_IP=${HOST_IP_EXTERNAL} +export CHATQNA_DATAPREP_DELETE_FILE_ENDPOINT="http://${HOST_IP_EXTERNAL}:${CHATQNA_REDIS_DATAPREP_PORT}/v1/dataprep/delete" +export CHATQNA_DATAPREP_GET_FILE_ENDPOINT="http://${HOST_IP_EXTERNAL}:${CHATQNA_REDIS_DATAPREP_PORT}/v1/dataprep/get" +export CHATQNA_DATAPREP_SERVICE_ENDPOINT="http://${HOST_IP_EXTERNAL}:${CHATQNA_REDIS_DATAPREP_PORT}/v1/dataprep/ingest" +export CHATQNA_FRONTEND_SERVICE_IP=${HOST_IP} +export CHATQNA_MEGA_SERVICE_HOST_IP=${HOST_IP} +export CHATQNA_REDIS_URL="redis://${HOST_IP}:${CHATQNA_REDIS_VECTOR_PORT}" +export CHATQNA_TEI_EMBEDDING_ENDPOINT="http://${HOST_IP}:${CHATQNA_TEI_EMBEDDING_PORT}" +export LLM_ENDPOINT="http://${HOST_IP}:${CHATQNA_VLLM_SERVICE_PORT}" + +export CHATQNA_BACKEND_SERVICE_NAME=chatqna +export CHATQNA_INDEX_NAME="rag-redis" +export CHATQNA_TYPE="CHATQNA_FAQGEN" +export FAQGen_COMPONENT_NAME="OpeaFaqGenvLLM" + +function build_docker_images() { + opea_branch=${opea_branch:-"main"} + # If the opea_branch isn't main, replace the git clone branch in Dockerfile. + if [[ "${opea_branch}" != "main" ]]; then + cd $WORKPATH + OLD_STRING="RUN git clone --depth 1 https://github.com/opea-project/GenAIComps.git" + NEW_STRING="RUN git clone --depth 1 --branch ${opea_branch} https://github.com/opea-project/GenAIComps.git" + find . -type f -name "Dockerfile*" | while read -r file; do + echo "Processing file: $file" + sed -i "s|$OLD_STRING|$NEW_STRING|g" "$file" + done + fi + + cd $WORKPATH/docker_image_build + git clone --depth 1 --branch ${opea_branch} https://github.com/opea-project/GenAIComps.git + git clone --depth 1 https://github.com/vllm-project/vllm.git + + echo "Build all the images with --no-cache, check docker_image_build.log for details..." + service_list="chatqna chatqna-ui dataprep retriever vllm-rocm llm-faqgen nginx" + docker compose -f build.yaml build ${service_list} --no-cache > ${LOG_PATH}/docker_image_build.log + + docker images && sleep 1s +} + +function start_services() { + cd "$WORKPATH"/docker_compose/amd/gpu/rocm + + # Start Docker Containers + docker compose -f compose_faqgen_vllm.yaml up -d > "${LOG_PATH}"/start_services_with_compose.log + + n=0 + until [[ "$n" -ge 500 ]]; do + docker logs chatqna-vllm-service >& "${LOG_PATH}"/chatqna-vllm-service_start.log + if grep -q "Application startup complete" "${LOG_PATH}"/chatqna-vllm-service_start.log; then + break + fi + sleep 20s + n=$((n+1)) + done +} + +function validate_service() { + local URL="$1" + local EXPECTED_RESULT="$2" + local SERVICE_NAME="$3" + local DOCKER_NAME="$4" + local INPUT_DATA="$5" + + local HTTP_STATUS=$(curl -s -o /dev/null -w "%{http_code}" -X POST -d "$INPUT_DATA" -H 'Content-Type: application/json' "$URL") + if [ "$HTTP_STATUS" -eq 200 ]; then + echo "[ $SERVICE_NAME ] HTTP status is 200. Checking content..." + + local CONTENT=$(curl -s -X POST -d "$INPUT_DATA" -H 'Content-Type: application/json' "$URL" | tee ${LOG_PATH}/${SERVICE_NAME}.log) + + if echo "$CONTENT" | grep -q "$EXPECTED_RESULT"; then + echo "[ $SERVICE_NAME ] Content is as expected." + else + echo "[ $SERVICE_NAME ] Content does not match the expected result: $CONTENT" + docker logs ${DOCKER_NAME} >> ${LOG_PATH}/${SERVICE_NAME}.log + exit 1 + fi + else + echo "[ $SERVICE_NAME ] HTTP status is not 200. Received status was $HTTP_STATUS" + docker logs ${DOCKER_NAME} >> ${LOG_PATH}/${SERVICE_NAME}.log + exit 1 + fi + sleep 1s +} + +function validate_microservices() { + # Check if the microservices are running correctly. + + # tei for embedding service + validate_service \ + "${ip_address}:${CHATQNA_TEI_EMBEDDING_PORT}/embed" \ + "\[\[" \ + "tei-embedding" \ + "chatqna-tei-embedding-service" \ + '{"inputs":"What is Deep Learning?"}' + + sleep 1m # retrieval can't curl as expected, try to wait for more time + + # retrieval microservice + test_embedding=$(python3 -c "import random; embedding = [random.uniform(-1, 1) for _ in range(768)]; print(embedding)") + validate_service \ + "${ip_address}:${CHATQNA_REDIS_RETRIEVER_PORT}/v1/retrieval" \ + " " \ + "retrieval" \ + "chatqna-retriever" \ + "{\"text\":\"What is the revenue of Nike in 2023?\",\"embedding\":${test_embedding}}" + + # tei for rerank microservice + validate_service \ + "${ip_address}:${CHATQNA_TEI_RERANKING_PORT}/rerank" \ + '{"index":1,"score":' \ + "tei-rerank" \ + "chatqna-tei-reranking-service" \ + '{"query":"What is Deep Learning?", "texts": ["Deep Learning is not...", "Deep learning is..."]}' + + # faqgen llm microservice + echo "validate llm-faqgen..." + validate_service \ + "${ip_address}:${CHATQNA_LLM_FAQGEN_PORT}/v1/faqgen" \ + "text" \ + "llm" \ + "chatqna-llm-faqgen" \ + '{"messages":"Text Embeddings Inference (TEI) is a toolkit for deploying and serving open source text embeddings and sequence classification models. TEI enables high-performance extraction for the most popular models, including FlagEmbedding, Ember, GTE and E5."}' + + # vllm for llm service + validate_service \ + "${ip_address}:${CHATQNA_VLLM_SERVICE_PORT}/v1/chat/completions" \ + "content" \ + "vllm-llm" \ + "chatqna-vllm-service" \ + '{"model": "meta-llama/Meta-Llama-3-8B-Instruct", "messages": [{"role": "user", "content": "What is Deep Learning?"}], "max_tokens": 17}' +} + +function validate_megaservice() { + # Curl the Mega Service + validate_service \ + "${ip_address}:${CHATQNA_BACKEND_SERVICE_PORT}/v1/chatqna" \ + "data" \ + "mega-chatqna" \ + "chatqna-backend-server" \ + '{"messages": "What is the revenue of Nike in 2023?"}' + +} + +function validate_frontend() { + cd $WORKPATH/ui/svelte + local conda_env_name="OPEA_e2e" + export PATH=${HOME}/miniconda3/bin/:$PATH + if conda info --envs | grep -q "$conda_env_name"; then + echo "$conda_env_name exist!" + else + conda create -n ${conda_env_name} python=3.12 -y + fi + + source activate ${conda_env_name} + + sed -i "s/localhost/$ip_address/g" playwright.config.ts + + conda install -c conda-forge nodejs=22.6.0 -y + npm install && npm ci && npx playwright install --with-deps + node -v && npm -v && pip list + + exit_status=0 + npx playwright test || exit_status=$? + + if [ $exit_status -ne 0 ]; then + echo "[TEST INFO]: ---------frontend test failed---------" + exit $exit_status + else + echo "[TEST INFO]: ---------frontend test passed---------" + fi +} + +function stop_docker() { + cd $WORKPATH/docker_compose/amd/gpu/rocm + docker compose -f compose_vllm.yaml down +} + +function main() { + + stop_docker + if [[ "$IMAGE_REPO" == "opea" ]]; then build_docker_images; fi + start_time=$(date +%s) + start_services + end_time=$(date +%s) + duration=$((end_time-start_time)) + echo "Mega service start duration is $duration s" && sleep 1s + + if [ "${mode}" == "perf" ]; then + python3 $WORKPATH/tests/chatqna_benchmark.py + elif [ "${mode}" == "" ]; then + validate_microservices + validate_megaservice + validate_frontend + fi + + stop_docker + echo y | docker system prune + +} + +main diff --git a/ChatQnA/tests/test_compose_on_rocm.sh b/ChatQnA/tests/test_compose_on_rocm.sh index f9623f1691..60dd76aebe 100644 --- a/ChatQnA/tests/test_compose_on_rocm.sh +++ b/ChatQnA/tests/test_compose_on_rocm.sh @@ -16,36 +16,41 @@ LOG_PATH="$WORKPATH/tests" ip_address=$(hostname -I | awk '{print $1}') export HOST_IP=${ip_address} -export CHATQNA_TGI_SERVICE_IMAGE="ghcr.io/huggingface/text-generation-inference:2.3.1-rocm" +export HOST_IP_EXTERNAL=${ip_address} + export CHATQNA_EMBEDDING_MODEL_ID="BAAI/bge-base-en-v1.5" -export CHATQNA_RERANK_MODEL_ID="BAAI/bge-reranker-base" +export CHATQNA_HUGGINGFACEHUB_API_TOKEN=${HUGGINGFACEHUB_API_TOKEN} export CHATQNA_LLM_MODEL_ID="meta-llama/Meta-Llama-3-8B-Instruct" -export CHATQNA_TGI_SERVICE_PORT=9009 -export CHATQNA_TEI_EMBEDDING_PORT=8090 -export CHATQNA_TEI_EMBEDDING_ENDPOINT="http://${HOST_IP}:${CHATQNA_TEI_EMBEDDING_PORT}" -export CHATQNA_TEI_RERANKING_PORT=8808 -export CHATQNA_REDIS_VECTOR_PORT=6379 -export CHATQNA_REDIS_VECTOR_INSIGHT_PORT=8001 -export CHATQNA_REDIS_DATAPREP_PORT=6007 +export CHATQNA_RERANK_MODEL_ID="BAAI/bge-reranker-base" + +export CHATQNA_BACKEND_SERVICE_PORT=8888 +export CHATQNA_FRONTEND_SERVICE_PORT=5173 +export CHATQNA_NGINX_PORT=80 +export CHATQNA_REDIS_DATAPREP_PORT=18103 export CHATQNA_REDIS_RETRIEVER_PORT=7000 -export CHATQNA_INDEX_NAME="rag-redis" -export CHATQNA_MEGA_SERVICE_HOST_IP=${HOST_IP} -export CHATQNA_RETRIEVER_SERVICE_HOST_IP=${HOST_IP} -export CHATQNA_BACKEND_SERVICE_ENDPOINT="http://127.0.0.1:${CHATQNA_BACKEND_SERVICE_PORT}/v1/chatqna" -export CHATQNA_DATAPREP_SERVICE_ENDPOINT="http://127.0.0.1:${CHATQNA_REDIS_DATAPREP_PORT}/v1/dataprep/ingest" -export CHATQNA_DATAPREP_GET_FILE_ENDPOINT="http://127.0.0.1:${CHATQNA_REDIS_DATAPREP_PORT}/v1/dataprep/get" -export CHATQNA_DATAPREP_DELETE_FILE_ENDPOINT="http://127.0.0.1:${CHATQNA_REDIS_DATAPREP_PORT}/v1/dataprep/delete" -export CHATQNA_FRONTEND_SERVICE_IP=${HOST_IP} -export CHATQNA_FRONTEND_SERVICE_PORT=15173 -export CHATQNA_BACKEND_SERVICE_NAME=chatqna +export CHATQNA_REDIS_VECTOR_INSIGHT_PORT=8001 +export CHATQNA_REDIS_VECTOR_PORT=6379 +export CHATQNA_TEI_EMBEDDING_PORT=18090 +export CHATQNA_TEI_RERANKING_PORT=18808 +export CHATQNA_TGI_SERVICE_PORT=18008 + +export CHATQNA_BACKEND_SERVICE_ENDPOINT="http://${HOST_IP_EXTERNAL}:${CHATQNA_BACKEND_SERVICE_PORT}/v1/chatqna" export CHATQNA_BACKEND_SERVICE_IP=${HOST_IP} -export CHATQNA_BACKEND_SERVICE_PORT=8888 -export CHATQNA_REDIS_URL="redis://${HOST_IP}:${CHATQNA_REDIS_VECTOR_PORT}" +export CHATQNA_DATAPREP_DELETE_FILE_ENDPOINT="http://${HOST_IP_EXTERNAL}:${CHATQNA_REDIS_DATAPREP_PORT}/v1/dataprep/delete" +export CHATQNA_DATAPREP_GET_FILE_ENDPOINT="http://${HOST_IP_EXTERNAL}:${CHATQNA_REDIS_DATAPREP_PORT}/v1/dataprep/get" +export CHATQNA_DATAPREP_SERVICE_ENDPOINT="http://${HOST_IP_EXTERNAL}:${CHATQNA_REDIS_DATAPREP_PORT}/v1/dataprep/ingest" export CHATQNA_EMBEDDING_SERVICE_HOST_IP=${HOST_IP} -export CHATQNA_RERANK_SERVICE_HOST_IP=${HOST_IP} +export CHATQNA_FRONTEND_SERVICE_IP=${HOST_IP} export CHATQNA_LLM_SERVICE_HOST_IP=${HOST_IP} -export CHATQNA_NGINX_PORT=80 -export CHATQNA_HUGGINGFACEHUB_API_TOKEN=${HUGGINGFACEHUB_API_TOKEN} +export CHATQNA_MEGA_SERVICE_HOST_IP=${HOST_IP} +export CHATQNA_REDIS_URL="redis://${HOST_IP}:${CHATQNA_REDIS_VECTOR_PORT}" +export CHATQNA_RERANK_SERVICE_HOST_IP=${HOST_IP} +export CHATQNA_RETRIEVER_SERVICE_HOST_IP=${HOST_IP} +export CHATQNA_TEI_EMBEDDING_ENDPOINT="http://${HOST_IP}:${CHATQNA_TEI_EMBEDDING_PORT}" + +export CHATQNA_BACKEND_SERVICE_NAME=chatqna +export CHATQNA_INDEX_NAME="rag-redis" + export PATH="~/miniconda3/bin:$PATH" function build_docker_images() { @@ -82,7 +87,7 @@ function start_services() { n=0 until [[ "$n" -ge 160 ]]; do - docker logs chatqna-tgi-server > "${LOG_PATH}"/tgi_service_start.log + docker logs chatqna-tgi-service > "${LOG_PATH}"/tgi_service_start.log if grep -q Connected "${LOG_PATH}"/tgi_service_start.log; then break fi @@ -140,66 +145,37 @@ function validate_microservices() { # tei for embedding service validate_service \ - "${ip_address}:8090/embed" \ + "${ip_address}:${CHATQNA_TEI_EMBEDDING_PORT}/embed" \ "[[" \ "tei-embedding" \ - "chatqna-tei-embedding-server" \ + "chatqna-tei-embedding-service" \ '{"inputs":"What is Deep Learning?"}' sleep 1m # retrieval can't curl as expected, try to wait for more time - # test /v1/dataprep/ingest upload file - echo "Deep learning is a subset of machine learning that utilizes neural networks with multiple layers to analyze various levels of abstract data representations. It enables computers to identify patterns and make decisions with minimal human intervention by learning from large amounts of data." > "$LOG_PATH"/dataprep_file.txt - validate_service \ - "http://${ip_address}:6007/v1/dataprep/ingest" \ - "Data preparation succeeded" \ - "dataprep_upload_file" \ - "dataprep-redis-server" - - # test /v1/dataprep/ingest upload link - validate_service \ - "http://${ip_address}:6007/v1/dataprep/ingest" \ - "Data preparation succeeded" \ - "dataprep_upload_link" \ - "dataprep-redis-server" - - # test /v1/dataprep/get - validate_service \ - "http://${ip_address}:6007/v1/dataprep/get" \ - '{"name":' \ - "dataprep_get" \ - "dataprep-redis-server" - - # test /v1/dataprep/delete - validate_service \ - "http://${ip_address}:6007/v1/dataprep/delete" \ - '{"status":true}' \ - "dataprep_del" \ - "dataprep-redis-server" - # retrieval microservice test_embedding=$(python3 -c "import random; embedding = [random.uniform(-1, 1) for _ in range(768)]; print(embedding)") validate_service \ - "${ip_address}:7000/v1/retrieval" \ - "retrieved_docs" \ + "${ip_address}:${CHATQNA_REDIS_RETRIEVER_PORT}/v1/retrieval" \ + " " \ "retrieval-microservice" \ - "chatqna-retriever-redis-server" \ + "chatqna-retriever" \ "{\"text\":\"What is the revenue of Nike in 2023?\",\"embedding\":${test_embedding}}" # tei for rerank microservice validate_service \ - "${ip_address}:8808/rerank" \ + "${ip_address}:${CHATQNA_TEI_RERANKING_PORT}/rerank" \ '{"index":1,"score":' \ "tei-rerank" \ - "chatqna-tei-reranking-server" \ + "chatqna-tei-reranking-service" \ '{"query":"What is Deep Learning?", "texts": ["Deep Learning is not...", "Deep learning is..."]}' # tgi for llm service validate_service \ - "${ip_address}:9009/generate" \ + "${ip_address}:${CHATQNA_TGI_SERVICE_PORT}/generate" \ "generated_text" \ "tgi-llm" \ - "chatqna-tgi-server" \ + "chatqna-tgi-service" \ '{"inputs":"What is Deep Learning?","parameters":{"max_new_tokens":17, "do_sample": true}}' } @@ -207,7 +183,7 @@ function validate_microservices() { function validate_megaservice() { # Curl the Mega Service validate_service \ - "${ip_address}:8888/v1/chatqna" \ + "${ip_address}:${CHATQNA_BACKEND_SERVICE_PORT}/v1/chatqna" \ "Nike" \ "chatqna-megaservice" \ "chatqna-backend-server" \ @@ -219,7 +195,7 @@ function validate_frontend() { echo "[ TEST INFO ]: --------- frontend test started ---------" cd "$WORKPATH"/ui/svelte local conda_env_name="OPEA_e2e" - export PATH=${HOME}/miniforge3/bin/:$PATH + export PATH=${HOME}/miniconda3/bin/:$PATH if conda info --envs | grep -q "$conda_env_name"; then echo "$conda_env_name exist!" else diff --git a/ChatQnA/tests/test_compose_vllm_on_rocm.sh b/ChatQnA/tests/test_compose_vllm_on_rocm.sh new file mode 100644 index 0000000000..12a489f18e --- /dev/null +++ b/ChatQnA/tests/test_compose_vllm_on_rocm.sh @@ -0,0 +1,230 @@ +#!/bin/bash +# Copyright (C) 2024 Intel Corporation +# SPDX-License-Identifier: Apache-2.0 + +set -e +IMAGE_REPO=${IMAGE_REPO:-"opea"} +IMAGE_TAG=${IMAGE_TAG:-"latest"} +echo "REGISTRY=IMAGE_REPO=${IMAGE_REPO}" +echo "TAG=IMAGE_TAG=${IMAGE_TAG}" +export REGISTRY=${IMAGE_REPO} +export TAG=${IMAGE_TAG} + +WORKPATH=$(dirname "$PWD") +LOG_PATH="$WORKPATH/tests" +ip_address=$(hostname -I | awk '{print $1}') + +export HOST_IP=${ip_address} +export HOST_IP_EXTERNAL=${ip_address} + +export CHATQNA_EMBEDDING_MODEL_ID="BAAI/bge-base-en-v1.5" +export CHATQNA_HUGGINGFACEHUB_API_TOKEN=${HUGGINGFACEHUB_API_TOKEN} +export CHATQNA_LLM_MODEL_ID="meta-llama/Meta-Llama-3-8B-Instruct" +export CHATQNA_RERANK_MODEL_ID="BAAI/bge-reranker-base" + +export CHATQNA_BACKEND_SERVICE_PORT=8888 +export CHATQNA_FRONTEND_SERVICE_PORT=5173 +export CHATQNA_NGINX_PORT=80 +export CHATQNA_REDIS_DATAPREP_PORT=18103 +export CHATQNA_REDIS_RETRIEVER_PORT=7000 +export CHATQNA_REDIS_VECTOR_INSIGHT_PORT=8001 +export CHATQNA_REDIS_VECTOR_PORT=6379 +export CHATQNA_TEI_EMBEDDING_PORT=18090 +export CHATQNA_TEI_RERANKING_PORT=18808 +export CHATQNA_VLLM_SERVICE_PORT=18008 + +export CHATQNA_BACKEND_SERVICE_ENDPOINT="http://${HOST_IP_EXTERNAL}:${CHATQNA_BACKEND_SERVICE_PORT}/v1/chatqna" +export CHATQNA_BACKEND_SERVICE_IP=${HOST_IP_EXTERNAL} +export CHATQNA_DATAPREP_DELETE_FILE_ENDPOINT="http://${HOST_IP_EXTERNAL}:${CHATQNA_REDIS_DATAPREP_PORT}/v1/dataprep/delete" +export CHATQNA_DATAPREP_GET_FILE_ENDPOINT="http://${HOST_IP_EXTERNAL}:${CHATQNA_REDIS_DATAPREP_PORT}/v1/dataprep/get" +export CHATQNA_DATAPREP_SERVICE_ENDPOINT="http://${HOST_IP_EXTERNAL}:${CHATQNA_REDIS_DATAPREP_PORT}/v1/dataprep/ingest" +export CHATQNA_EMBEDDING_SERVICE_HOST_IP=${HOST_IP} +export CHATQNA_FRONTEND_SERVICE_IP=${HOST_IP} +export CHATQNA_LLM_SERVICE_HOST_IP=${HOST_IP} +export CHATQNA_MEGA_SERVICE_HOST_IP=${HOST_IP} +export CHATQNA_REDIS_URL="redis://${HOST_IP}:${CHATQNA_REDIS_VECTOR_PORT}" +export CHATQNA_RERANK_SERVICE_HOST_IP=${HOST_IP} +export CHATQNA_RETRIEVER_SERVICE_HOST_IP=${HOST_IP} +export CHATQNA_TEI_EMBEDDING_ENDPOINT="http://${HOST_IP}:${CHATQNA_TEI_EMBEDDING_PORT}" + +export CHATQNA_BACKEND_SERVICE_NAME=chatqna +export CHATQNA_INDEX_NAME="rag-redis" + + +function build_docker_images() { + opea_branch=${opea_branch:-"main"} + # If the opea_branch isn't main, replace the git clone branch in Dockerfile. + if [[ "${opea_branch}" != "main" ]]; then + cd $WORKPATH + OLD_STRING="RUN git clone --depth 1 https://github.com/opea-project/GenAIComps.git" + NEW_STRING="RUN git clone --depth 1 --branch ${opea_branch} https://github.com/opea-project/GenAIComps.git" + find . -type f -name "Dockerfile*" | while read -r file; do + echo "Processing file: $file" + sed -i "s|$OLD_STRING|$NEW_STRING|g" "$file" + done + fi + + cd $WORKPATH/docker_image_build + git clone --depth 1 --branch ${opea_branch} https://github.com/opea-project/GenAIComps.git + git clone --depth 1 https://github.com/vllm-project/vllm.git + + echo "Build all the images with --no-cache, check docker_image_build.log for details..." + service_list="chatqna chatqna-ui dataprep retriever vllm-rocm nginx" + docker compose -f build.yaml build ${service_list} --no-cache > ${LOG_PATH}/docker_image_build.log + + docker images && sleep 1s +} + +function start_services() { + cd "$WORKPATH"/docker_compose/amd/gpu/rocm + + # Start Docker Containers + docker compose -f compose_vllm.yaml up -d > "${LOG_PATH}"/start_services_with_compose.log + + n=0 + until [[ "$n" -ge 500 ]]; do + docker logs chatqna-vllm-service >& "${LOG_PATH}"/chatqna-vllm-service_start.log + if grep -q "Application startup complete" "${LOG_PATH}"/chatqna-vllm-service_start.log; then + break + fi + sleep 20s + n=$((n+1)) + done +} + +function validate_service() { + local URL="$1" + local EXPECTED_RESULT="$2" + local SERVICE_NAME="$3" + local DOCKER_NAME="$4" + local INPUT_DATA="$5" + + local HTTP_STATUS=$(curl -s -o /dev/null -w "%{http_code}" -X POST -d "$INPUT_DATA" -H 'Content-Type: application/json' "$URL") + if [ "$HTTP_STATUS" -eq 200 ]; then + echo "[ $SERVICE_NAME ] HTTP status is 200. Checking content..." + + local CONTENT=$(curl -s -X POST -d "$INPUT_DATA" -H 'Content-Type: application/json' "$URL" | tee ${LOG_PATH}/${SERVICE_NAME}.log) + + if echo "$CONTENT" | grep -q "$EXPECTED_RESULT"; then + echo "[ $SERVICE_NAME ] Content is as expected." + else + echo "[ $SERVICE_NAME ] Content does not match the expected result: $CONTENT" + docker logs ${DOCKER_NAME} >> ${LOG_PATH}/${SERVICE_NAME}.log + exit 1 + fi + else + echo "[ $SERVICE_NAME ] HTTP status is not 200. Received status was $HTTP_STATUS" + docker logs ${DOCKER_NAME} >> ${LOG_PATH}/${SERVICE_NAME}.log + exit 1 + fi + sleep 1s +} + +function validate_microservices() { + # Check if the microservices are running correctly. + + # tei for embedding service + validate_service \ + "${ip_address}:${CHATQNA_TEI_EMBEDDING_PORT}/embed" \ + "\[\[" \ + "tei-embedding" \ + "chatqna-tei-embedding-service" \ + '{"inputs":"What is Deep Learning?"}' + + sleep 1m # retrieval can't curl as expected, try to wait for more time + + # retrieval microservice + test_embedding=$(python3 -c "import random; embedding = [random.uniform(-1, 1) for _ in range(768)]; print(embedding)") + validate_service \ + "${ip_address}:${CHATQNA_REDIS_RETRIEVER_PORT}/v1/retrieval" \ + " " \ + "retrieval" \ + "chatqna-retriever" \ + "{\"text\":\"What is the revenue of Nike in 2023?\",\"embedding\":${test_embedding}}" + + # tei for rerank microservice + validate_service \ + "${ip_address}:${CHATQNA_TEI_RERANKING_PORT}/rerank" \ + '{"index":1,"score":' \ + "tei-rerank" \ + "chatqna-tei-reranking-service" \ + '{"query":"What is Deep Learning?", "texts": ["Deep Learning is not...", "Deep learning is..."]}' + + # vllm for llm service + validate_service \ + "${ip_address}:${CHATQNA_VLLM_SERVICE_PORT}/v1/chat/completions" \ + "content" \ + "vllm-llm" \ + "chatqna-vllm-service" \ + '{"model": "meta-llama/Meta-Llama-3-8B-Instruct", "messages": [{"role": "user", "content": "What is Deep Learning?"}], "max_tokens": 17}' +} + +function validate_megaservice() { + # Curl the Mega Service + validate_service \ + "${ip_address}:${CHATQNA_BACKEND_SERVICE_PORT}/v1/chatqna" \ + "data" \ + "mega-chatqna" \ + "chatqna-backend-server" \ + '{"messages": "What is the revenue of Nike in 2023?"}' + +} + +function validate_frontend() { + cd $WORKPATH/ui/svelte + local conda_env_name="OPEA_e2e" + export PATH=${HOME}/miniconda3/bin/:$PATH + if conda info --envs | grep -q "$conda_env_name"; then + echo "$conda_env_name exist!" + else + conda create -n ${conda_env_name} python=3.12 -y + fi + + source activate ${conda_env_name} + + sed -i "s/localhost/$ip_address/g" playwright.config.ts + + conda install -c conda-forge nodejs=22.6.0 -y + npm install && npm ci && npx playwright install --with-deps + node -v && npm -v && pip list + + exit_status=0 + npx playwright test || exit_status=$? + + if [ $exit_status -ne 0 ]; then + echo "[TEST INFO]: ---------frontend test failed---------" + exit $exit_status + else + echo "[TEST INFO]: ---------frontend test passed---------" + fi +} + +function stop_docker() { + cd $WORKPATH/docker_compose/amd/gpu/rocm + docker compose -f compose_vllm.yaml down +} + +function main() { + + stop_docker + if [[ "$IMAGE_REPO" == "opea" ]]; then build_docker_images; fi + start_time=$(date +%s) + start_services + end_time=$(date +%s) + duration=$((end_time-start_time)) + echo "Mega service start duration is $duration s" && sleep 1s + + if [ "${mode}" == "perf" ]; then + python3 $WORKPATH/tests/chatqna_benchmark.py + elif [ "${mode}" == "" ]; then + validate_microservices + validate_megaservice + validate_frontend + fi + + stop_docker + echo y | docker system prune + +} + +main diff --git a/ChatQnA/ui/svelte/playwright.config.ts b/ChatQnA/ui/svelte/playwright.config.ts index 937f88bf7b..e26b9f3f8c 100644 --- a/ChatQnA/ui/svelte/playwright.config.ts +++ b/ChatQnA/ui/svelte/playwright.config.ts @@ -21,7 +21,7 @@ export default defineConfig({ * Maximum time expect() should wait for the condition to be met. * For example in `await expect(locator).toHaveText();` */ - timeout: 5000, + timeout: 20000, }, /* Run tests in files in parallel */ fullyParallel: true,