[ModelRunner]Add profile execute duration observation #996

depeng1994 · 2025-05-28T13:31:47Z

What this PR does / why we need it?

We need to observe the time consumed in each stage of inference (including pre-processing, model forward, etc.), without any performance loss.
Therefore, we use the event timestamp mechanism of the NPU to mark any stage during the execution of the NPU device (this marking operation is executed asynchronously, with no performance loss).
Additionally, we provide a blocking synchronization API **pop_captured_sync** to be called at an appropriate time, to print the time consumed in all observed stages.

model_runner_v1.py file only changed 5 lines, all of which were ProfileExecuteDuration() calls, and nothing else was changed， while more changes were showed due to the alignment issue.

Does this PR introduce any user-facing change?

Use env VLLM_MODEL_EXECUTE_TIME_OBSERVE to enable this feature

How was this patch tested?

Tested in deepseek model，Print like this:

5691:(IntegratedWorker pid=1502285) Profile execute duration [Decode]: [post process]:14.17ms [prepare input and forward]:9.57ms [forward]:4.14ms
5695:(IntegratedWorker pid=1502285) Profile execute duration [Decode]: [post process]:14.29ms [prepare input and forward]:10.19ms [forward]:4.14ms
5697:(IntegratedWorker pid=1502343) Profile execute duration [Decode]: [post process]:14.81ms [prepare input and forward]:10.29ms [forward]:3.99ms
5701:(IntegratedWorker pid=1502343) Profile execute duration [Decode]: [post process]:14.10ms [prepare input and forward]:10.62ms [forward]:4.33ms
5705:(IntegratedWorker pid=1502343) Profile execute duration [Decode]: [post process]:14.65ms [prepare input and forward]:9.58ms [forward]:4.20ms
5709:(IntegratedWorker pid=1502343) Profile execute duration [Decode]: [post process]:14.43ms [prepare input and forward]:9.88ms [forward]:4.20ms
5711:(IntegratedWorker pid=1502401) Profile execute duration [Decode]: [post process]:14.89ms [prepare input and forward]:10.49ms [forward]:4.19ms
5715:(IntegratedWorker pid=1502401) Profile execute duration [Decode]: [post process]:14.14ms [prepare input and forward]:11.21ms [forward]:4.18ms
5719:(IntegratedWorker pid=1502401) Profile execute duration [Decode]: [post process]:14.71ms [prepare input and forward]:10.15ms [forward]:4.42ms
5723:(IntegratedWorker pid=1502401) Profile execute duration [Decode]: [post process]:14.62ms [prepare input and forward]:10.31ms [forward]:4.25ms
5725:(IntegratedWorker pid=1502462) Profile execute duration [Decode]: [post process]:14.12ms [prepare input and forward]:10.33ms [forward]:4.24ms
5729:(IntegratedWorker pid=1502462) Profile execute duration [Decode]: [post process]:14.58ms [prepare input and forward]:10.85ms [forward]:4.32ms
5733:(IntegratedWorker pid=1502462) Profile execute duration [Decode]: [post process]:14.32ms [prepare input and forward]:9.79ms [forward]:4.28ms
5737:(IntegratedWorker pid=1502462) Profile execute duration [Decode]: [post process]:15.06ms [prepare input and forward]:9.89ms [forward]:4.32ms
5739:(IntegratedWorker pid=1502524) Profile execute duration [Decode]: [post process]:14.62ms [prepare input and forward]:10.48ms [forward]:4.27ms
5743:(IntegratedWorker pid=1502524) Profile execute duration [Decode]: [post process]:14.60ms [prepare input and forward]:10.71ms [forward]:4.61ms
5747:(IntegratedWorker pid=1502524) Profile execute duration [Decode]: [post process]:14.21ms [prepare input and forward]:10.10ms [forward]:4.52ms
5751:(IntegratedWorker pid=1502524) Profile execute duration [Decode]: [post process]:15.03ms [prepare input and forward]:10.00ms [forward]:4.42ms

github-actions bot added the module:core label May 28, 2025

depeng1994 force-pushed the main branch 2 times, most recently from 383aaa3 to c6e01fe Compare May 29, 2025 06:44

depeng1994 changed the title ~~add profile execute time dot~~ Add profile execute duration observation May 29, 2025

depeng1994 force-pushed the main branch 2 times, most recently from e77b9ec to 2dc97f1 Compare May 29, 2025 10:21

depeng1994 changed the title ~~Add profile execute duration observation~~ [ModelRunner]Add profile execute duration observation May 29, 2025

Add profile execute duration observation

800a544

depeng1994 force-pushed the main branch from 2dc97f1 to 800a544 Compare May 29, 2025 10:43

depeng1994 closed this May 29, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[ModelRunner]Add profile execute duration observation #996

[ModelRunner]Add profile execute duration observation #996

depeng1994 commented May 28, 2025 •

edited

Loading

Uh oh!

Uh oh!

[ModelRunner]Add profile execute duration observation #996

[ModelRunner]Add profile execute duration observation #996

Conversation

depeng1994 commented May 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR does / why we need it?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

Uh oh!

depeng1994 commented May 28, 2025 •

edited

Loading