Skip to content

Commit 7adbba6

Browse files
authored
Enable vLLM Profiling for ChatQnA (#1124)
1 parent 0d52c2f commit 7adbba6

File tree

2 files changed

+52
-0
lines changed

2 files changed

+52
-0
lines changed

ChatQnA/docker_compose/intel/cpu/xeon/README.md

Lines changed: 51 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -432,6 +432,57 @@ curl -X POST "http://${host_ip}:6007/v1/dataprep/delete_file" \
432432
-H "Content-Type: application/json"
433433
```
434434
435+
436+
### Profile Microservices
437+
438+
To further analyze MicroService Performance, users could follow the instructions to profile MicroServices.
439+
440+
#### 1. vLLM backend Service
441+
Users could follow previous section to testing vLLM microservice or ChatQnA MegaService.
442+
By default, vLLM profiling is not enabled. Users could start and stop profiling by following commands.
443+
444+
##### Start vLLM profiling
445+
446+
```bash
447+
curl http://${host_ip}:9009/start_profile \
448+
-H "Content-Type: application/json" \
449+
-d '{"model": "Intel/neural-chat-7b-v3-3"}'
450+
```
451+
Users would see below docker logs from vllm-service if profiling is started correctly.
452+
```bash
453+
INFO api_server.py:361] Starting profiler...
454+
INFO api_server.py:363] Profiler started.
455+
INFO: x.x.x.x:35940 - "POST /start_profile HTTP/1.1" 200 OK
456+
```
457+
After vLLM profiling is started, users could start asking questions and get responses from vLLM MicroService
458+
or ChatQnA MicroService.
459+
460+
##### Stop vLLM profiling
461+
By following command, users could stop vLLM profliing and generate a *.pt.trace.json.gz file as profiling result
462+
under /mnt folder in vllm-service docker instance.
463+
```bash
464+
# vLLM Service
465+
curl http://${host_ip}:9009/stop_profile \
466+
-H "Content-Type: application/json" \
467+
-d '{"model": "Intel/neural-chat-7b-v3-3"}'
468+
```
469+
Users would see below docker logs from vllm-service if profiling is stopped correctly.
470+
```bash
471+
INFO api_server.py:368] Stopping profiler...
472+
INFO api_server.py:370] Profiler stopped.
473+
INFO: x.x.x.x:41614 - "POST /stop_profile HTTP/1.1" 200 OK
474+
```
475+
After vllm profiling is stopped, users could use below command to get the *.pt.trace.json.gz file under /mnt folder.
476+
```bash
477+
docker cp vllm-service:/mnt/ .
478+
```
479+
480+
##### Check profiling result
481+
Open a web browser and type "chrome://tracing" or "ui.perfetto.dev", and then load the json.gz file, you should be able
482+
to see the vLLM profiling result as below diagram.
483+
![image](https://github.com/user-attachments/assets/55c7097e-5574-41dc-97a7-5e87c31bc286)
484+
485+
435486
## 🚀 Launch the UI
436487
437488
### Launch with origin port

ChatQnA/docker_compose/intel/cpu/xeon/compose_vllm.yaml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -86,6 +86,7 @@ services:
8686
https_proxy: ${https_proxy}
8787
HF_TOKEN: ${HUGGINGFACEHUB_API_TOKEN}
8888
LLM_MODEL_ID: ${LLM_MODEL_ID}
89+
VLLM_TORCH_PROFILER_DIR: "/mnt"
8990
command: --model $LLM_MODEL_ID --host 0.0.0.0 --port 80
9091
chatqna-xeon-backend-server:
9192
image: ${REGISTRY:-opea}/chatqna:${TAG:-latest}

0 commit comments

Comments
 (0)