@@ -432,57 +432,66 @@ curl -X POST "http://${host_ip}:6007/v1/dataprep/delete_file" \
432
432
-H "Content-Type: application/json"
433
433
```
434
434
435
-
436
435
### Profile Microservices
437
436
438
- To further analyze MicroService Performance, users could follow the instructions to profile MicroServices.
437
+ To further analyze MicroService Performance, users could follow the instructions to profile MicroServices.
439
438
440
439
#### 1. vLLM backend Service
441
- Users could follow previous section to testing vLLM microservice or ChatQnA MegaService.
442
- By default, vLLM profiling is not enabled. Users could start and stop profiling by following commands.
443
440
444
- ##### Start vLLM profiling
441
+ Users could follow previous section to testing vLLM microservice or ChatQnA MegaService.
442
+ By default, vLLM profiling is not enabled. Users could start and stop profiling by following commands.
445
443
446
- ```bash
447
- curl http://${host_ip}:9009/start_profile \
448
- -H "Content-Type: application/json" \
449
- -d ' {" model" : " Intel/neural-chat-7b-v3-3" }'
450
- ```
451
- Users would see below docker logs from vllm-service if profiling is started correctly.
452
- ```bash
453
- INFO api_server.py:361] Starting profiler...
454
- INFO api_server.py:363] Profiler started.
455
- INFO: x.x.x.x:35940 - "POST /start_profile HTTP/1.1" 200 OK
456
- ```
457
- After vLLM profiling is started, users could start asking questions and get responses from vLLM MicroService
458
- or ChatQnA MicroService.
459
-
460
- ##### Stop vLLM profiling
461
- By following command, users could stop vLLM profliing and generate a *.pt.trace.json.gz file as profiling result
462
- under /mnt folder in vllm-service docker instance.
463
- ```bash
464
- # vLLM Service
465
- curl http://${host_ip}:9009/stop_profile \
466
- -H "Content-Type: application/json" \
467
- -d ' {" model" : " Intel/neural-chat-7b-v3-3" }'
468
- ```
469
- Users would see below docker logs from vllm-service if profiling is stopped correctly.
470
- ```bash
471
- INFO api_server.py:368] Stopping profiler...
472
- INFO api_server.py:370] Profiler stopped.
473
- INFO: x.x.x.x:41614 - "POST /stop_profile HTTP/1.1" 200 OK
474
- ```
475
- After vllm profiling is stopped, users could use below command to get the *.pt.trace.json.gz file under /mnt folder.
476
- ```bash
477
- docker cp vllm-service:/mnt/ .
478
- ```
444
+ ##### Start vLLM profiling
445
+
446
+ ```bash
447
+ curl http://${host_ip}:9009/start_profile \
448
+ -H "Content-Type: application/json" \
449
+ -d ' {" model" : " Intel/neural-chat-7b-v3-3" }'
450
+ ```
451
+
452
+ Users would see below docker logs from vllm-service if profiling is started correctly.
453
+
454
+ ```bash
455
+ INFO api_server.py:361] Starting profiler...
456
+ INFO api_server.py:363] Profiler started.
457
+ INFO: x.x.x.x:35940 - "POST /start_profile HTTP/1.1" 200 OK
458
+ ```
459
+
460
+ After vLLM profiling is started, users could start asking questions and get responses from vLLM MicroService
461
+ or ChatQnA MicroService.
462
+
463
+ ##### Stop vLLM profiling
464
+
465
+ By following command, users could stop vLLM profliing and generate a \*.pt.trace.json.gz file as profiling result
466
+ under /mnt folder in vllm-service docker instance.
467
+
468
+ ```bash
469
+ # vLLM Service
470
+ curl http://${host_ip}:9009/stop_profile \
471
+ -H "Content-Type: application/json" \
472
+ -d ' {" model" : " Intel/neural-chat-7b-v3-3" }'
473
+ ```
474
+
475
+ Users would see below docker logs from vllm-service if profiling is stopped correctly.
476
+
477
+ ```bash
478
+ INFO api_server.py:368] Stopping profiler...
479
+ INFO api_server.py:370] Profiler stopped.
480
+ INFO: x.x.x.x:41614 - "POST /stop_profile HTTP/1.1" 200 OK
481
+ ```
482
+
483
+ After vllm profiling is stopped, users could use below command to get the \*.pt.trace.json.gz file under /mnt folder.
484
+
485
+ ```bash
486
+ docker cp vllm-service:/mnt/ .
487
+ ```
488
+
489
+ ##### Check profiling result
479
490
480
- ##### Check profiling result
481
- Open a web browser and type "chrome://tracing" or "ui.perfetto.dev", and then load the json.gz file, you should be able
482
- to see the vLLM profiling result as below diagram.
491
+ Open a web browser and type "chrome://tracing" or "ui.perfetto.dev", and then load the json.gz file, you should be able
492
+ to see the vLLM profiling result as below diagram.
483
493

484
494
485
-
486
495
## 🚀 Launch the UI
487
496
488
497
### Launch with origin port
0 commit comments