Skip to content

Commit 096a37a

Browse files
EdgeCraftRAG: Fix multiple issues (#1143)
Signed-off-by: Mingyuan Qi <mingyuan.qi@intel.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
1 parent 6f8fa6a commit 096a37a

26 files changed

+335
-302
lines changed

EdgeCraftRAG/Dockerfile

Lines changed: 2 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -13,13 +13,11 @@ RUN useradd -m -s /bin/bash user && \
1313
mkdir -p /home/user && \
1414
chown -R user /home/user/
1515

16-
COPY ./edgecraftrag /home/user/edgecraftrag
16+
COPY ./requirements.txt /home/user/requirements.txt
1717
COPY ./chatqna.py /home/user/chatqna.py
1818

19-
WORKDIR /home/user/edgecraftrag
20-
RUN pip install --no-cache-dir -r requirements.txt
21-
2219
WORKDIR /home/user
20+
RUN pip install --no-cache-dir -r requirements.txt
2321

2422
USER user
2523

EdgeCraftRAG/Dockerfile.server

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -25,6 +25,9 @@ RUN useradd -m -s /bin/bash user && \
2525

2626
COPY ./edgecraftrag /home/user/edgecraftrag
2727

28+
RUN mkdir -p /home/user/gradio_cache
29+
ENV GRADIO_TEMP_DIR=/home/user/gradio_cache
30+
2831
WORKDIR /home/user/edgecraftrag
2932
RUN pip install --no-cache-dir -r requirements.txt
3033

EdgeCraftRAG/README.md

Lines changed: 101 additions & 70 deletions
Original file line numberDiff line numberDiff line change
@@ -7,39 +7,112 @@ quality and performance.
77

88
## Quick Start Guide
99

10-
### Run Containers with Docker Compose
10+
### (Optional) Build Docker Images for Mega Service, Server and UI by your own
11+
12+
If you want to build the images by your own, please follow the steps:
13+
14+
```bash
15+
cd GenAIExamples/EdgeCraftRAG
16+
17+
docker build --build-arg http_proxy=$http_proxy --build-arg https_proxy=$https_proxy --build-arg no_proxy=$no_proxy -t opea/edgecraftrag:latest -f Dockerfile .
18+
docker build --build-arg http_proxy=$http_proxy --build-arg https_proxy=$https_proxy --build-arg no_proxy=$no_proxy -t opea/edgecraftrag-server:latest -f Dockerfile.server .
19+
docker build --build-arg http_proxy=$http_proxy --build-arg https_proxy=$https_proxy --build-arg no_proxy=$no_proxy -t opea/edgecraftrag-ui:latest -f ui/docker/Dockerfile.ui .
20+
```
21+
22+
### Using Intel Arc GPU
23+
24+
#### Local inference with OpenVINO for Intel Arc GPU
25+
26+
You can select "local" type in generation field which is the default approach to enable Intel Arc GPU for LLM. You don't need to build images for "local" type.
27+
28+
#### vLLM with OpenVINO for Intel Arc GPU
29+
30+
You can also select "vLLM" as generation type, to enable this type, you'll need to build the vLLM image for Intel Arc GPU before service bootstrap.
31+
Please follow this link [vLLM with OpenVINO](https://github.com/opea-project/GenAIComps/tree/main/comps/llms/text-generation/vllm/langchain#build-docker-image) to build the vLLM image.
32+
33+
### Start Edge Craft RAG Services with Docker Compose
34+
35+
If you want to enable vLLM with OpenVINO service, please finish the steps in [Launch vLLM with OpenVINO service](#optional-launch-vllm-with-openvino-service) first.
1136

1237
```bash
1338
cd GenAIExamples/EdgeCraftRAG/docker_compose/intel/gpu/arc
1439

1540
export MODEL_PATH="your model path for all your models"
1641
export DOC_PATH="your doc path for uploading a dir of files"
42+
export GRADIO_PATH="your gradio cache path for transferring files"
43+
44+
# Make sure all 3 folders have 1000:1000 permission, otherwise
45+
# chown 1000:1000 ${MODEL_PATH} ${DOC_PATH} ${GRADIO_PATH}
46+
47+
# Use `ip a` to check your active ip
1748
export HOST_IP="your host ip"
18-
export UI_SERVICE_PORT="port for UI service"
1949

20-
# Optional for vllm endpoint
21-
export vLLM_ENDPOINT="http://${HOST_IP}:8008"
50+
# Check group id of video and render
51+
export VIDEOGROUPID=$(getent group video | cut -d: -f3)
52+
export RENDERGROUPID=$(getent group render | cut -d: -f3)
2253

2354
# If you have a proxy configured, uncomment below line
24-
# export no_proxy=$no_proxy,${HOST_IP},edgecraftrag,edgecraftrag-server
55+
# export no_proxy=${no_proxy},${HOST_IP},edgecraftrag,edgecraftrag-server
56+
# export NO_PROXY=${NO_PROXY},${HOST_IP},edgecraftrag,edgecraftrag-server
2557
# If you have a HF mirror configured, it will be imported to the container
2658
# export HF_ENDPOINT="your HF mirror endpoint"
2759

2860
# By default, the ports of the containers are set, uncomment if you want to change
2961
# export MEGA_SERVICE_PORT=16011
3062
# export PIPELINE_SERVICE_PORT=16011
63+
# export UI_SERVICE_PORT="8082"
64+
65+
# Prepare models for embedding, reranking and generation, you can also choose other OpenVINO optimized models
66+
# Here is the example:
67+
pip install --upgrade --upgrade-strategy eager "optimum[openvino]"
68+
69+
optimum-cli export openvino -m BAAI/bge-small-en-v1.5 ${MODEL_PATH}/BAAI/bge-small-en-v1.5 --task sentence-similarity
70+
optimum-cli export openvino -m BAAI/bge-reranker-large ${MODEL_PATH}/BAAI/bge-reranker-large --task sentence-similarity
71+
optimum-cli export openvino -m Qwen/Qwen2-7B-Instruct ${MODEL_PATH}/Qwen/Qwen2-7B-Instruct/INT4_compressed_weights --weight-format int4
3172

3273
docker compose up -d
74+
3375
```
3476

35-
### (Optional) Build Docker Images for Mega Service, Server and UI by your own
77+
#### (Optional) Launch vLLM with OpenVINO service
78+
79+
1. Set up Environment Variables
3680

3781
```bash
38-
cd GenAIExamples/EdgeCraftRAG
82+
export LLM_MODEL=#your model id
83+
export VLLM_SERVICE_PORT=8008
84+
export vLLM_ENDPOINT="http://${HOST_IP}:${VLLM_SERVICE_PORT}"
85+
export HUGGINGFACEHUB_API_TOKEN=#your HF token
86+
```
3987

40-
docker build --build-arg http_proxy=$HTTP_PROXY --build-arg https_proxy=$HTTPS_PROXY --build-arg no_proxy=$NO_PROXY -t opea/edgecraftrag:latest -f Dockerfile .
41-
docker build --build-arg http_proxy=$HTTP_PROXY --build-arg https_proxy=$HTTPS_PROXY --build-arg no_proxy=$NO_PROXY -t opea/edgecraftrag-server:latest -f Dockerfile.server .
42-
docker build --build-arg http_proxy=$HTTP_PROXY --build-arg https_proxy=$HTTPS_PROXY --build-arg no_proxy=$NO_PROXY -t opea/edgecraftrag-ui:latest -f ui/docker/Dockerfile.ui .
88+
2. Uncomment below code in 'GenAIExamples/EdgeCraftRAG/docker_compose/intel/gpu/arc/compose.yaml'
89+
90+
```bash
91+
# vllm-openvino-server:
92+
# container_name: vllm-openvino-server
93+
# image: opea/vllm-arc:latest
94+
# ports:
95+
# - ${VLLM_SERVICE_PORT:-8008}:80
96+
# environment:
97+
# HTTPS_PROXY: ${https_proxy}
98+
# HTTP_PROXY: ${https_proxy}
99+
# VLLM_OPENVINO_DEVICE: GPU
100+
# HF_ENDPOINT: ${HF_ENDPOINT}
101+
# HF_TOKEN: ${HUGGINGFACEHUB_API_TOKEN}
102+
# volumes:
103+
# - /dev/dri/by-path:/dev/dri/by-path
104+
# - $HOME/.cache/huggingface:/root/.cache/huggingface
105+
# devices:
106+
# - /dev/dri
107+
# entrypoint: /bin/bash -c "\
108+
# cd / && \
109+
# export VLLM_CPU_KVCACHE_SPACE=50 && \
110+
# export VLLM_OPENVINO_ENABLE_QUANTIZED_WEIGHTS=ON && \
111+
# python3 -m vllm.entrypoints.openai.api_server \
112+
# --model '${LLM_MODEL}' \
113+
# --max_model_len=1024 \
114+
# --host 0.0.0.0 \
115+
# --port 80"
43116
```
44117

45118
### ChatQnA with LLM Example (Command Line)
@@ -109,7 +182,7 @@ curl -X POST http://${HOST_IP}:16010/v1/settings/pipelines -H "Content-Type: app
109182
# }
110183

111184
# Prepare data from local directory
112-
curl -X POST http://${HOST_IP}:16010/v1/data -H "Content-Type: application/json" -d '{"local_path":"#REPLACE WITH YOUR LOCAL DOC DIR#"}' | jq '.'
185+
curl -X POST http://${HOST_IP}:16010/v1/data -H "Content-Type: application/json" -d '{"local_path":"docs/#REPLACE WITH YOUR DIR WITHIN MOUNTED DOC PATH#"}' | jq '.'
113186

114187
# Validate Mega Service
115188
curl -X POST http://${HOST_IP}:16011/v1/chatqna -H "Content-Type: application/json" -d '{"messages":"#REPLACE WITH YOUR QUESTION HERE#", "top_n":5, "max_tokens":512}' | jq '.'
@@ -121,33 +194,14 @@ Open your browser, access http://${HOST_IP}:8082
121194

122195
> Your browser should be running on the same host of your console, otherwise you will need to access UI with your host domain name instead of ${HOST_IP}.
123196
124-
### (Optional) Launch vLLM with OpenVINO service
197+
To create a default pipeline, you need to click the `Create Pipeline` button on the `RAG Settings` page. You can also create multiple pipelines or update existing pipelines through the `Pipeline Configuration`, but please note that active pipelines cannot be updated.
198+
![create_pipeline](assets/img/create_pipeline.png)
125199

126-
```bash
127-
# 1. export LLM_MODEL
128-
export LLM_MODEL="your model id"
129-
# 2. Uncomment below code in 'GenAIExamples/EdgeCraftRAG/docker_compose/intel/gpu/arc/compose.yaml'
130-
# vllm-service:
131-
# image: vllm:openvino
132-
# container_name: vllm-openvino-server
133-
# depends_on:
134-
# - vllm-service
135-
# ports:
136-
# - "8008:80"
137-
# environment:
138-
# no_proxy: ${no_proxy}
139-
# http_proxy: ${http_proxy}
140-
# https_proxy: ${https_proxy}
141-
# vLLM_ENDPOINT: ${vLLM_ENDPOINT}
142-
# LLM_MODEL: ${LLM_MODEL}
143-
# entrypoint: /bin/bash -c "\
144-
# cd / && \
145-
# export VLLM_CPU_KVCACHE_SPACE=50 && \
146-
# python3 -m vllm.entrypoints.openai.api_server \
147-
# --model '${LLM_MODEL}' \
148-
# --host 0.0.0.0 \
149-
# --port 80"
150-
```
200+
After the pipeline creation, you can upload your data in the `Chatbot` page.
201+
![upload_data](assets/img/upload_data.png)
202+
203+
Then, you can submit messages in the chat box.
204+
![chat_with_rag](assets/img/chat_with_rag.png)
151205

152206
## Advanced User Guide
153207

@@ -156,27 +210,13 @@ export LLM_MODEL="your model id"
156210
#### Create a pipeline
157211

158212
```bash
159-
curl -X POST http://${HOST_IP}:16010/v1/settings/pipelines -H "Content-Type: application/json" -d @examples/test_pipeline.json | jq '.'
160-
```
161-
162-
It will take some time to prepare the embedding model.
163-
164-
#### Upload a text
165-
166-
```bash
167-
curl -X POST http://${HOST_IP}:16010/v1/data -H "Content-Type: application/json" -d @examples/test_data.json | jq '.'
168-
```
169-
170-
#### Provide a query to retrieve context with similarity search.
171-
172-
```bash
173-
curl -X POST http://${HOST_IP}:16010/v1/retrieval -H "Content-Type: application/json" -d @examples/test_query.json | jq '.'
213+
curl -X POST http://${HOST_IP}:16010/v1/settings/pipelines -H "Content-Type: application/json" -d @tests/test_pipeline_local_llm.json | jq '.'
174214
```
175215

176-
#### Create the second pipeline test2
216+
#### Update a pipeline
177217

178218
```bash
179-
curl -X POST http://${HOST_IP}:16010/v1/settings/pipelines -H "Content-Type: application/json" -d @examples/test_pipeline2.json | jq '.'
219+
curl -X PATCH http://${HOST_IP}:16010/v1/settings/pipelines -H "Content-Type: application/json" -d @tests/test_pipeline_local_llm.json | jq '.'
180220
```
181221

182222
#### Check all pipelines
@@ -185,27 +225,18 @@ curl -X POST http://${HOST_IP}:16010/v1/settings/pipelines -H "Content-Type: app
185225
curl -X GET http://${HOST_IP}:16010/v1/settings/pipelines -H "Content-Type: application/json" | jq '.'
186226
```
187227

188-
#### Compare similarity retrieval (test1) and keyword retrieval (test2)
228+
#### Activate a pipeline
189229

190230
```bash
191-
# Activate pipeline test1
192231
curl -X PATCH http://${HOST_IP}:16010/v1/settings/pipelines/test1 -H "Content-Type: application/json" -d '{"active": "true"}' | jq '.'
193-
# Similarity retrieval
194-
curl -X POST http://${HOST_IP}:16010/v1/retrieval -H "Content-Type: application/json" -d '{"messages":"number"}' | jq '.'
195-
196-
# Activate pipeline test2
197-
curl -X PATCH http://${HOST_IP}:16010/v1/settings/pipelines/test2 -H "Content-Type: application/json" -d '{"active": "true"}' | jq '.'
198-
# Keyword retrieval
199-
curl -X POST http://${HOST_IP}:16010/v1/retrieval -H "Content-Type: application/json" -d '{"messages":"number"}' | jq '.'
200-
201232
```
202233

203234
### Model Management
204235

205236
#### Load a model
206237

207238
```bash
208-
curl -X POST http://${HOST_IP}:16010/v1/settings/models -H "Content-Type: application/json" -d @examples/test_model_load.json | jq '.'
239+
curl -X POST http://${HOST_IP}:16010/v1/settings/models -H "Content-Type: application/json" -d '{"model_type": "reranker", "model_id": "BAAI/bge-reranker-large", "model_path": "./models/bge_ov_reranker", "device": "cpu"}' | jq '.'
209240
```
210241

211242
It will take some time to load the model.
@@ -219,7 +250,7 @@ curl -X GET http://${HOST_IP}:16010/v1/settings/models -H "Content-Type: applica
219250
#### Update a model
220251

221252
```bash
222-
curl -X PATCH http://${HOST_IP}:16010/v1/settings/models/BAAI/bge-reranker-large -H "Content-Type: application/json" -d @examples/test_model_update.json | jq '.'
253+
curl -X PATCH http://${HOST_IP}:16010/v1/settings/models/BAAI/bge-reranker-large -H "Content-Type: application/json" -d '{"model_type": "reranker", "model_id": "BAAI/bge-reranker-large", "model_path": "./models/bge_ov_reranker", "device": "gpu"}' | jq '.'
223254
```
224255

225256
#### Check a certain model
@@ -239,14 +270,14 @@ curl -X DELETE http://${HOST_IP}:16010/v1/settings/models/BAAI/bge-reranker-larg
239270
#### Add a text
240271

241272
```bash
242-
curl -X POST http://${HOST_IP}:16010/v1/data -H "Content-Type: application/json" -d @examples/test_data.json | jq '.'
273+
curl -X POST http://${HOST_IP}:16010/v1/data -H "Content-Type: application/json" -d '{"text":"#REPLACE WITH YOUR TEXT"}' | jq '.'
243274
```
244275

245276
#### Add files from existed file path
246277

247278
```bash
248-
curl -X POST http://${HOST_IP}:16010/v1/data -H "Content-Type: application/json" -d @examples/test_data_dir.json | jq '.'
249-
curl -X POST http://${HOST_IP}:16010/v1/data -H "Content-Type: application/json" -d @examples/test_data_file.json | jq '.'
279+
curl -X POST http://${HOST_IP}:16010/v1/data -H "Content-Type: application/json" -d '{"local_path":"docs/#REPLACE WITH YOUR DIR WITHIN MOUNTED DOC PATH#"}' | jq '.'
280+
curl -X POST http://${HOST_IP}:16010/v1/data -H "Content-Type: application/json" -d '{"local_path":"docs/#REPLACE WITH YOUR FILE WITHIN MOUNTED DOC PATH#"}' | jq '.'
250281
```
251282

252283
#### Check all files
@@ -270,5 +301,5 @@ curl -X DELETE http://${HOST_IP}:16010/v1/data/files/test2.docx -H "Content-Type
270301
#### Update a file
271302

272303
```bash
273-
curl -X PATCH http://${HOST_IP}:16010/v1/data/files/test.pdf -H "Content-Type: application/json" -d @examples/test_data_file.json | jq '.'
304+
curl -X PATCH http://${HOST_IP}:16010/v1/data/files/test.pdf -H "Content-Type: application/json" -d '{"local_path":"docs/#REPLACE WITH YOUR FILE WITHIN MOUNTED DOC PATH#"}' | jq '.'
274305
```
92.4 KB
Loading
168 KB
Loading
85.8 KB
Loading

EdgeCraftRAG/chatqna.py

Lines changed: 16 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -18,6 +18,7 @@
1818
ChatMessage,
1919
UsageInfo,
2020
)
21+
from comps.cores.proto.docarray import LLMParams
2122
from fastapi import Request
2223
from fastapi.responses import StreamingResponse
2324

@@ -30,7 +31,20 @@ def __init__(self, megaservice, host="0.0.0.0", port=16011):
3031

3132
async def handle_request(self, request: Request):
3233
input = await request.json()
33-
result_dict, runtime_graph = await self.megaservice.schedule(initial_inputs=input)
34+
stream_opt = input.get("stream", False)
35+
chat_request = ChatCompletionRequest.parse_obj(input)
36+
parameters = LLMParams(
37+
max_tokens=chat_request.max_tokens if chat_request.max_tokens else 1024,
38+
top_k=chat_request.top_k if chat_request.top_k else 10,
39+
top_p=chat_request.top_p if chat_request.top_p else 0.95,
40+
temperature=chat_request.temperature if chat_request.temperature else 0.01,
41+
frequency_penalty=chat_request.frequency_penalty if chat_request.frequency_penalty else 0.0,
42+
presence_penalty=chat_request.presence_penalty if chat_request.presence_penalty else 0.0,
43+
repetition_penalty=chat_request.repetition_penalty if chat_request.repetition_penalty else 1.03,
44+
streaming=stream_opt,
45+
chat_template=chat_request.chat_template if chat_request.chat_template else None,
46+
)
47+
result_dict, runtime_graph = await self.megaservice.schedule(initial_inputs=input, llm_parameters=parameters)
3448
for node, response in result_dict.items():
3549
if isinstance(response, StreamingResponse):
3650
return response
@@ -61,7 +75,7 @@ def add_remote_service(self):
6175
port=PIPELINE_SERVICE_PORT,
6276
endpoint="/v1/chatqna",
6377
use_remote_service=True,
64-
service_type=ServiceType.UNDEFINED,
78+
service_type=ServiceType.LLM,
6579
)
6680
self.megaservice.add(edgecraftrag)
6781
self.gateway = EdgeCraftRagGateway(megaservice=self.megaservice, host="0.0.0.0", port=self.port)

0 commit comments

Comments
 (0)