@@ -7,39 +7,112 @@ quality and performance.
7
7
8
8
## Quick Start Guide
9
9
10
- ### Run Containers with Docker Compose
10
+ ### (Optional) Build Docker Images for Mega Service, Server and UI by your own
11
+
12
+ If you want to build the images by your own, please follow the steps:
13
+
14
+ ``` bash
15
+ cd GenAIExamples/EdgeCraftRAG
16
+
17
+ docker build --build-arg http_proxy=$http_proxy --build-arg https_proxy=$https_proxy --build-arg no_proxy=$no_proxy -t opea/edgecraftrag:latest -f Dockerfile .
18
+ docker build --build-arg http_proxy=$http_proxy --build-arg https_proxy=$https_proxy --build-arg no_proxy=$no_proxy -t opea/edgecraftrag-server:latest -f Dockerfile.server .
19
+ docker build --build-arg http_proxy=$http_proxy --build-arg https_proxy=$https_proxy --build-arg no_proxy=$no_proxy -t opea/edgecraftrag-ui:latest -f ui/docker/Dockerfile.ui .
20
+ ```
21
+
22
+ ### Using Intel Arc GPU
23
+
24
+ #### Local inference with OpenVINO for Intel Arc GPU
25
+
26
+ You can select "local" type in generation field which is the default approach to enable Intel Arc GPU for LLM. You don't need to build images for "local" type.
27
+
28
+ #### vLLM with OpenVINO for Intel Arc GPU
29
+
30
+ You can also select "vLLM" as generation type, to enable this type, you'll need to build the vLLM image for Intel Arc GPU before service bootstrap.
31
+ Please follow this link [ vLLM with OpenVINO] ( https://github.com/opea-project/GenAIComps/tree/main/comps/llms/text-generation/vllm/langchain#build-docker-image ) to build the vLLM image.
32
+
33
+ ### Start Edge Craft RAG Services with Docker Compose
34
+
35
+ If you want to enable vLLM with OpenVINO service, please finish the steps in [ Launch vLLM with OpenVINO service] ( #optional-launch-vllm-with-openvino-service ) first.
11
36
12
37
``` bash
13
38
cd GenAIExamples/EdgeCraftRAG/docker_compose/intel/gpu/arc
14
39
15
40
export MODEL_PATH=" your model path for all your models"
16
41
export DOC_PATH=" your doc path for uploading a dir of files"
42
+ export GRADIO_PATH=" your gradio cache path for transferring files"
43
+
44
+ # Make sure all 3 folders have 1000:1000 permission, otherwise
45
+ # chown 1000:1000 ${MODEL_PATH} ${DOC_PATH} ${GRADIO_PATH}
46
+
47
+ # Use `ip a` to check your active ip
17
48
export HOST_IP=" your host ip"
18
- export UI_SERVICE_PORT=" port for UI service"
19
49
20
- # Optional for vllm endpoint
21
- export vLLM_ENDPOINT=" http://${HOST_IP} :8008"
50
+ # Check group id of video and render
51
+ export VIDEOGROUPID=$( getent group video | cut -d: -f3)
52
+ export RENDERGROUPID=$( getent group render | cut -d: -f3)
22
53
23
54
# If you have a proxy configured, uncomment below line
24
- # export no_proxy=$no_proxy,${HOST_IP},edgecraftrag,edgecraftrag-server
55
+ # export no_proxy=${no_proxy},${HOST_IP},edgecraftrag,edgecraftrag-server
56
+ # export NO_PROXY=${NO_PROXY},${HOST_IP},edgecraftrag,edgecraftrag-server
25
57
# If you have a HF mirror configured, it will be imported to the container
26
58
# export HF_ENDPOINT="your HF mirror endpoint"
27
59
28
60
# By default, the ports of the containers are set, uncomment if you want to change
29
61
# export MEGA_SERVICE_PORT=16011
30
62
# export PIPELINE_SERVICE_PORT=16011
63
+ # export UI_SERVICE_PORT="8082"
64
+
65
+ # Prepare models for embedding, reranking and generation, you can also choose other OpenVINO optimized models
66
+ # Here is the example:
67
+ pip install --upgrade --upgrade-strategy eager " optimum[openvino]"
68
+
69
+ optimum-cli export openvino -m BAAI/bge-small-en-v1.5 ${MODEL_PATH} /BAAI/bge-small-en-v1.5 --task sentence-similarity
70
+ optimum-cli export openvino -m BAAI/bge-reranker-large ${MODEL_PATH} /BAAI/bge-reranker-large --task sentence-similarity
71
+ optimum-cli export openvino -m Qwen/Qwen2-7B-Instruct ${MODEL_PATH} /Qwen/Qwen2-7B-Instruct/INT4_compressed_weights --weight-format int4
31
72
32
73
docker compose up -d
74
+
33
75
```
34
76
35
- ### (Optional) Build Docker Images for Mega Service, Server and UI by your own
77
+ #### (Optional) Launch vLLM with OpenVINO service
78
+
79
+ 1 . Set up Environment Variables
36
80
37
81
``` bash
38
- cd GenAIExamples/EdgeCraftRAG
82
+ export LLM_MODEL=# your model id
83
+ export VLLM_SERVICE_PORT=8008
84
+ export vLLM_ENDPOINT=" http://${HOST_IP} :${VLLM_SERVICE_PORT} "
85
+ export HUGGINGFACEHUB_API_TOKEN=# your HF token
86
+ ```
39
87
40
- docker build --build-arg http_proxy=$HTTP_PROXY --build-arg https_proxy=$HTTPS_PROXY --build-arg no_proxy=$NO_PROXY -t opea/edgecraftrag:latest -f Dockerfile .
41
- docker build --build-arg http_proxy=$HTTP_PROXY --build-arg https_proxy=$HTTPS_PROXY --build-arg no_proxy=$NO_PROXY -t opea/edgecraftrag-server:latest -f Dockerfile.server .
42
- docker build --build-arg http_proxy=$HTTP_PROXY --build-arg https_proxy=$HTTPS_PROXY --build-arg no_proxy=$NO_PROXY -t opea/edgecraftrag-ui:latest -f ui/docker/Dockerfile.ui .
88
+ 2 . Uncomment below code in 'GenAIExamples/EdgeCraftRAG/docker_compose/intel/gpu/arc/compose.yaml'
89
+
90
+ ``` bash
91
+ # vllm-openvino-server:
92
+ # container_name: vllm-openvino-server
93
+ # image: opea/vllm-arc:latest
94
+ # ports:
95
+ # - ${VLLM_SERVICE_PORT:-8008}:80
96
+ # environment:
97
+ # HTTPS_PROXY: ${https_proxy}
98
+ # HTTP_PROXY: ${https_proxy}
99
+ # VLLM_OPENVINO_DEVICE: GPU
100
+ # HF_ENDPOINT: ${HF_ENDPOINT}
101
+ # HF_TOKEN: ${HUGGINGFACEHUB_API_TOKEN}
102
+ # volumes:
103
+ # - /dev/dri/by-path:/dev/dri/by-path
104
+ # - $HOME/.cache/huggingface:/root/.cache/huggingface
105
+ # devices:
106
+ # - /dev/dri
107
+ # entrypoint: /bin/bash -c "\
108
+ # cd / && \
109
+ # export VLLM_CPU_KVCACHE_SPACE=50 && \
110
+ # export VLLM_OPENVINO_ENABLE_QUANTIZED_WEIGHTS=ON && \
111
+ # python3 -m vllm.entrypoints.openai.api_server \
112
+ # --model '${LLM_MODEL}' \
113
+ # --max_model_len=1024 \
114
+ # --host 0.0.0.0 \
115
+ # --port 80"
43
116
```
44
117
45
118
### ChatQnA with LLM Example (Command Line)
@@ -109,7 +182,7 @@ curl -X POST http://${HOST_IP}:16010/v1/settings/pipelines -H "Content-Type: app
109
182
# }
110
183
111
184
# Prepare data from local directory
112
- curl -X POST http://${HOST_IP} :16010/v1/data -H " Content-Type: application/json" -d ' {"local_path":"#REPLACE WITH YOUR LOCAL DOC DIR #"}' | jq ' .'
185
+ curl -X POST http://${HOST_IP} :16010/v1/data -H " Content-Type: application/json" -d ' {"local_path":"docs/ #REPLACE WITH YOUR DIR WITHIN MOUNTED DOC PATH #"}' | jq ' .'
113
186
114
187
# Validate Mega Service
115
188
curl -X POST http://${HOST_IP} :16011/v1/chatqna -H " Content-Type: application/json" -d ' {"messages":"#REPLACE WITH YOUR QUESTION HERE#", "top_n":5, "max_tokens":512}' | jq ' .'
@@ -121,33 +194,14 @@ Open your browser, access http://${HOST_IP}:8082
121
194
122
195
> Your browser should be running on the same host of your console, otherwise you will need to access UI with your host domain name instead of ${HOST_IP}.
123
196
124
- ### (Optional) Launch vLLM with OpenVINO service
197
+ To create a default pipeline, you need to click the ` Create Pipeline ` button on the ` RAG Settings ` page. You can also create multiple pipelines or update existing pipelines through the ` Pipeline Configuration ` , but please note that active pipelines cannot be updated.
198
+ ![ create_pipeline] ( assets/img/create_pipeline.png )
125
199
126
- ``` bash
127
- # 1. export LLM_MODEL
128
- export LLM_MODEL=" your model id"
129
- # 2. Uncomment below code in 'GenAIExamples/EdgeCraftRAG/docker_compose/intel/gpu/arc/compose.yaml'
130
- # vllm-service:
131
- # image: vllm:openvino
132
- # container_name: vllm-openvino-server
133
- # depends_on:
134
- # - vllm-service
135
- # ports:
136
- # - "8008:80"
137
- # environment:
138
- # no_proxy: ${no_proxy}
139
- # http_proxy: ${http_proxy}
140
- # https_proxy: ${https_proxy}
141
- # vLLM_ENDPOINT: ${vLLM_ENDPOINT}
142
- # LLM_MODEL: ${LLM_MODEL}
143
- # entrypoint: /bin/bash -c "\
144
- # cd / && \
145
- # export VLLM_CPU_KVCACHE_SPACE=50 && \
146
- # python3 -m vllm.entrypoints.openai.api_server \
147
- # --model '${LLM_MODEL}' \
148
- # --host 0.0.0.0 \
149
- # --port 80"
150
- ```
200
+ After the pipeline creation, you can upload your data in the ` Chatbot ` page.
201
+ ![ upload_data] ( assets/img/upload_data.png )
202
+
203
+ Then, you can submit messages in the chat box.
204
+ ![ chat_with_rag] ( assets/img/chat_with_rag.png )
151
205
152
206
## Advanced User Guide
153
207
@@ -156,27 +210,13 @@ export LLM_MODEL="your model id"
156
210
#### Create a pipeline
157
211
158
212
``` bash
159
- curl -X POST http://${HOST_IP} :16010/v1/settings/pipelines -H " Content-Type: application/json" -d @examples/test_pipeline.json | jq ' .'
160
- ```
161
-
162
- It will take some time to prepare the embedding model.
163
-
164
- #### Upload a text
165
-
166
- ``` bash
167
- curl -X POST http://${HOST_IP} :16010/v1/data -H " Content-Type: application/json" -d @examples/test_data.json | jq ' .'
168
- ```
169
-
170
- #### Provide a query to retrieve context with similarity search.
171
-
172
- ``` bash
173
- curl -X POST http://${HOST_IP} :16010/v1/retrieval -H " Content-Type: application/json" -d @examples/test_query.json | jq ' .'
213
+ curl -X POST http://${HOST_IP} :16010/v1/settings/pipelines -H " Content-Type: application/json" -d @tests/test_pipeline_local_llm.json | jq ' .'
174
214
```
175
215
176
- #### Create the second pipeline test2
216
+ #### Update a pipeline
177
217
178
218
``` bash
179
- curl -X POST http://${HOST_IP} :16010/v1/settings/pipelines -H " Content-Type: application/json" -d @examples/test_pipeline2 .json | jq ' .'
219
+ curl -X PATCH http://${HOST_IP} :16010/v1/settings/pipelines -H " Content-Type: application/json" -d @tests/test_pipeline_local_llm .json | jq ' .'
180
220
```
181
221
182
222
#### Check all pipelines
@@ -185,27 +225,18 @@ curl -X POST http://${HOST_IP}:16010/v1/settings/pipelines -H "Content-Type: app
185
225
curl -X GET http://${HOST_IP} :16010/v1/settings/pipelines -H " Content-Type: application/json" | jq ' .'
186
226
```
187
227
188
- #### Compare similarity retrieval (test1) and keyword retrieval (test2)
228
+ #### Activate a pipeline
189
229
190
230
``` bash
191
- # Activate pipeline test1
192
231
curl -X PATCH http://${HOST_IP} :16010/v1/settings/pipelines/test1 -H " Content-Type: application/json" -d ' {"active": "true"}' | jq ' .'
193
- # Similarity retrieval
194
- curl -X POST http://${HOST_IP} :16010/v1/retrieval -H " Content-Type: application/json" -d ' {"messages":"number"}' | jq ' .'
195
-
196
- # Activate pipeline test2
197
- curl -X PATCH http://${HOST_IP} :16010/v1/settings/pipelines/test2 -H " Content-Type: application/json" -d ' {"active": "true"}' | jq ' .'
198
- # Keyword retrieval
199
- curl -X POST http://${HOST_IP} :16010/v1/retrieval -H " Content-Type: application/json" -d ' {"messages":"number"}' | jq ' .'
200
-
201
232
```
202
233
203
234
### Model Management
204
235
205
236
#### Load a model
206
237
207
238
``` bash
208
- curl -X POST http://${HOST_IP} :16010/v1/settings/models -H " Content-Type: application/json" -d @examples/test_model_load.json | jq ' .'
239
+ curl -X POST http://${HOST_IP} :16010/v1/settings/models -H " Content-Type: application/json" -d ' {"model_type": "reranker", "model_id": "BAAI/bge-reranker-large", "model_path": "./models/bge_ov_reranker", "device": "cpu"} ' | jq ' .'
209
240
```
210
241
211
242
It will take some time to load the model.
@@ -219,7 +250,7 @@ curl -X GET http://${HOST_IP}:16010/v1/settings/models -H "Content-Type: applica
219
250
#### Update a model
220
251
221
252
``` bash
222
- curl -X PATCH http://${HOST_IP} :16010/v1/settings/models/BAAI/bge-reranker-large -H " Content-Type: application/json" -d @examples/test_model_update.json | jq ' .'
253
+ curl -X PATCH http://${HOST_IP} :16010/v1/settings/models/BAAI/bge-reranker-large -H " Content-Type: application/json" -d ' {"model_type": "reranker", "model_id": "BAAI/bge-reranker-large", "model_path": "./models/bge_ov_reranker", "device": "gpu"} ' | jq ' .'
223
254
```
224
255
225
256
#### Check a certain model
@@ -239,14 +270,14 @@ curl -X DELETE http://${HOST_IP}:16010/v1/settings/models/BAAI/bge-reranker-larg
239
270
#### Add a text
240
271
241
272
``` bash
242
- curl -X POST http://${HOST_IP} :16010/v1/data -H " Content-Type: application/json" -d @examples/test_data.json | jq ' .'
273
+ curl -X POST http://${HOST_IP} :16010/v1/data -H " Content-Type: application/json" -d ' {"text":"#REPLACE WITH YOUR TEXT"} ' | jq ' .'
243
274
```
244
275
245
276
#### Add files from existed file path
246
277
247
278
``` bash
248
- curl -X POST http://${HOST_IP} :16010/v1/data -H " Content-Type: application/json" -d @examples/test_data_dir.json | jq ' .'
249
- curl -X POST http://${HOST_IP} :16010/v1/data -H " Content-Type: application/json" -d @examples/test_data_file.json | jq ' .'
279
+ curl -X POST http://${HOST_IP} :16010/v1/data -H " Content-Type: application/json" -d ' {"local_path":"docs/#REPLACE WITH YOUR DIR WITHIN MOUNTED DOC PATH#"} ' | jq ' .'
280
+ curl -X POST http://${HOST_IP} :16010/v1/data -H " Content-Type: application/json" -d ' {"local_path":"docs/#REPLACE WITH YOUR FILE WITHIN MOUNTED DOC PATH#"} ' | jq ' .'
250
281
```
251
282
252
283
#### Check all files
@@ -270,5 +301,5 @@ curl -X DELETE http://${HOST_IP}:16010/v1/data/files/test2.docx -H "Content-Type
270
301
#### Update a file
271
302
272
303
``` bash
273
- curl -X PATCH http://${HOST_IP} :16010/v1/data/files/test.pdf -H " Content-Type: application/json" -d @examples/test_data_file.json | jq ' .'
304
+ curl -X PATCH http://${HOST_IP} :16010/v1/data/files/test.pdf -H " Content-Type: application/json" -d ' {"local_path":"docs/#REPLACE WITH YOUR FILE WITHIN MOUNTED DOC PATH#"} ' | jq ' .'
274
305
```
0 commit comments