[ModelRunner]Add profile execute duration observation #1013

depeng1994 · 2025-05-29T10:57:23Z

What this PR does / why we need it?

We need to observe the time consumed in each stage of inference (including pre-processing, model forward, etc.), without any performance loss.
Therefore, we use the event timestamp mechanism of the NPU to mark any stage during the execution of the NPU device (this marking operation is executed asynchronously, with no performance loss).
Additionally, we provide a blocking synchronization API pop_captured_sync to be called at an appropriate time, to print the time consumed in all observed stages.

model_runner_v1.py file only changed 5 lines, all of which were ProfileExecuteDuration() calls, and nothing else was changed， while more changes were showed due to the alignment issue.

Does this PR introduce any user-facing change?

Use env VLLM_MODEL_EXECUTE_TIME_OBSERVE to enable this feature

How was this patch tested?

Tested in deepseek model，Print like this:

5691:(IntegratedWorker pid=1502285) Profile execute duration [Decode]: [post process]:14.17ms [prepare input and forward]:9.57ms [forward]:4.14ms
5695:(IntegratedWorker pid=1502285) Profile execute duration [Decode]: [post process]:14.29ms [prepare input and forward]:10.19ms [forward]:4.14ms
5697:(IntegratedWorker pid=1502343) Profile execute duration [Decode]: [post process]:14.81ms [prepare input and forward]:10.29ms [forward]:3.99ms
5701:(IntegratedWorker pid=1502343) Profile execute duration [Decode]: [post process]:14.10ms [prepare input and forward]:10.62ms [forward]:4.33ms
5705:(IntegratedWorker pid=1502343) Profile execute duration [Decode]: [post process]:14.65ms [prepare input and forward]:9.58ms [forward]:4.20ms
5709:(IntegratedWorker pid=1502343) Profile execute duration [Decode]: [post process]:14.43ms [prepare input and forward]:9.88ms [forward]:4.20ms
5711:(IntegratedWorker pid=1502401) Profile execute duration [Decode]: [post process]:14.89ms [prepare input and forward]:10.49ms [forward]:4.19ms
5715:(IntegratedWorker pid=1502401) Profile execute duration [Decode]: [post process]:14.14ms [prepare input and forward]:11.21ms [forward]:4.18ms
5719:(IntegratedWorker pid=1502401) Profile execute duration [Decode]: [post process]:14.71ms [prepare input and forward]:10.15ms [forward]:4.42ms
5723:(IntegratedWorker pid=1502401) Profile execute duration [Decode]: [post process]:14.62ms [prepare input and forward]:10.31ms [forward]:4.25ms
5725:(IntegratedWorker pid=1502462) Profile execute duration [Decode]: [post process]:14.12ms [prepare input and forward]:10.33ms [forward]:4.24ms
5729:(IntegratedWorker pid=1502462) Profile execute duration [Decode]: [post process]:14.58ms [prepare input and forward]:10.85ms [forward]:4.32ms
5733:(IntegratedWorker pid=1502462) Profile execute duration [Decode]: [post process]:14.32ms [prepare input and forward]:9.79ms [forward]:4.28ms
5737:(IntegratedWorker pid=1502462) Profile execute duration [Decode]: [post process]:15.06ms [prepare input and forward]:9.89ms [forward]:4.32ms
5739:(IntegratedWorker pid=1502524) Profile execute duration [Decode]: [post process]:14.62ms [prepare input and forward]:10.48ms [forward]:4.27ms
5743:(IntegratedWorker pid=1502524) Profile execute duration [Decode]: [post process]:14.60ms [prepare input and forward]:10.71ms [forward]:4.61ms
5747:(IntegratedWorker pid=1502524) Profile execute duration [Decode]: [post process]:14.21ms [prepare input and forward]:10.10ms [forward]:4.52ms
5751:(IntegratedWorker pid=1502524) Profile execute duration [Decode]: [post process]:15.03ms [prepare input and forward]:10.00ms [forward]:4.42ms

MengqingCao · 2025-05-30T02:30:50Z

Overall lgtm, just some suggestions:

Thanks for the detail pr description, could you please convert it to a tutorial for this feature? We can put it in docs/source/developer_guide/profile_execute_duration_observation.md
Could you add ut for this feature?
Plz run bash format.sh locally to fix lint failures, and sign off your commit by commit -sm "your commit message" to fix DCO

depeng1994 · 2025-05-30T07:38:27Z

@wangxiyuan @Yikun @ganyi1996ppo please take a look, tks

Signed-off-by: depeng1994 <depengzhang@foxmail.com>

github-actions · 2025-06-03T09:40:09Z

This pull request has conflicts, please resolve those before we can evaluate the pull request.

Signed-off-by: depeng1994 <depengzhang@foxmail.com>

github-actions · 2025-06-04T08:26:33Z

This pull request has conflicts, please resolve those before we can evaluate the pull request.

github-actions · 2025-06-04T10:32:12Z

This pull request has conflicts, please resolve those before we can evaluate the pull request.

Yikun · 2025-06-04T13:45:36Z

vllm_ascend/envs.py

@@ -36,6 +36,8 @@
    lambda: bool(int(os.getenv("COMPILE_CUSTOM_KERNELS", "1"))),
    "VLLM_ENABLE_MC2":
    lambda: bool(int(os.getenv("VLLM_ENABLE_MC2", '0'))),
+    "VLLM_MODEL_EXECUTE_TIME_OBSERVE":


Suggested change

"VLLM_MODEL_EXECUTE_TIME_OBSERVE":

"VLLM_ASCEND_MODEL_EXECUTE_TIME_OBSERVE":

Signed-off-by: depeng1994 <depengzhang@foxmail.com>

github-actions · 2025-06-05T08:36:48Z

This pull request has conflicts, please resolve those before we can evaluate the pull request.

Yikun

Commit message should also update.

The PR is good enough, just some nits see comments inline.

You can choose to address them in a separate PR.

Yikun · 2025-06-05T15:41:53Z

docs/source/developer_guide/evaluation/profile_execute_duration.md

+* Use the non-blocking API `ProfileExecuteDuration().capture_async` to set observation points asynchronously when you need to observe the execution duration.
+* Use the blocking API `ProfileExecuteDuration().pop_captured_sync` at an appropriate time to get and print the execution durations of all observed stages.
+
+## Example Output


The doc is good but we could provide a e2e guid to help devs understand. Such as:

We already add key stage of inference (including pre-processing, model forward, etc.), you can execute inference script:

VLLM_ASCEND_MODEL_EXECUTE_TIME_OBSERVE=1 python3 vllm-ascend/examples/offline_inference_npu.py

Yikun · 2025-06-05T15:43:45Z

vllm_ascend/worker/model_runner_v1.py

+                for tag, duration in durations.items()
+            ]
+            captured_name = "Decode" if self.attn_state == AscendAttentionState.DecodeOnly else "Prefill"
+            print(f"Profile execute duration [{captured_name}]:",


print or log?

Yikun · 2025-06-05T15:56:55Z

@ganyi1996ppo @wangxiyuan pls

github-actions bot added the module:core label May 29, 2025

depeng1994 force-pushed the main branch from 800a544 to 294933e Compare May 29, 2025 11:52

Add profile execute duration observation

52da0d4

Signed-off-by: depeng1994 <depengzhang@foxmail.com>

depeng1994 force-pushed the main branch from 294933e to 52da0d4 Compare May 30, 2025 08:21

add docs

b4c8344

Signed-off-by: depeng1994 <depengzhang@foxmail.com>

github-actions bot added documentation Improvements or additions to documentation merge-conflicts labels May 30, 2025

depeng1994 added 2 commits June 3, 2025 18:19

Add UT

5f1f78d

Signed-off-by: depeng1994 <depengzhang@foxmail.com>

Merge branch 'main' of github.com:vllm-project/vllm-ascend

cd4885d

Signed-off-by: depeng1994 <depengzhang@foxmail.com>

github-actions bot added module:tests and removed merge-conflicts labels Jun 3, 2025

depeng1994 force-pushed the main branch 3 times, most recently from 25ee163 to 9967032 Compare June 3, 2025 12:49

fix lint

b5d8972

Signed-off-by: depeng1994 <depengzhang@foxmail.com>

depeng1994 force-pushed the main branch from 9967032 to b5d8972 Compare June 3, 2025 13:16

wangxiyuan mentioned this pull request Jun 4, 2025

[release] 0.9.0rc1 release checklist #904

Open

78 tasks

github-actions bot added the merge-conflicts label Jun 4, 2025

Merge branch 'main' into main

3d68df3

github-actions bot removed the merge-conflicts label Jun 4, 2025

github-actions bot added the merge-conflicts label Jun 4, 2025

Yikun reviewed Jun 4, 2025

View reviewed changes

github-actions bot removed the merge-conflicts label Jun 5, 2025

depeng1994 force-pushed the main branch from a904b35 to 8ebe98a Compare June 5, 2025 02:48

Fix issue and conflict

0d448c7

Signed-off-by: depeng1994 <depengzhang@foxmail.com>

depeng1994 force-pushed the main branch from 8ebe98a to 0d448c7 Compare June 5, 2025 03:20

github-actions bot added the merge-conflicts label Jun 5, 2025

Merge branch 'main' into main

3df32b3

github-actions bot removed the merge-conflicts label Jun 5, 2025

ApsarasX approved these changes Jun 5, 2025

View reviewed changes

Yikun approved these changes Jun 5, 2025

View reviewed changes

Yikun added the ready read for review label Jun 5, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[ModelRunner]Add profile execute duration observation #1013

[ModelRunner]Add profile execute duration observation #1013

depeng1994 commented May 29, 2025

Uh oh!

MengqingCao commented May 30, 2025

Uh oh!

depeng1994 commented May 30, 2025

Uh oh!

github-actions bot commented Jun 3, 2025

Uh oh!

github-actions bot commented Jun 4, 2025

Uh oh!

github-actions bot commented Jun 4, 2025

Uh oh!

Yikun Jun 4, 2025

Uh oh!

depeng1994 Jun 5, 2025

Uh oh!

github-actions bot commented Jun 5, 2025

Uh oh!

Yikun left a comment •

edited

Loading

Uh oh!

Yikun Jun 5, 2025

Uh oh!

Yikun Jun 5, 2025

Uh oh!

Yikun commented Jun 5, 2025

Uh oh!

Uh oh!

	"VLLM_MODEL_EXECUTE_TIME_OBSERVE":
	"VLLM_ASCEND_MODEL_EXECUTE_TIME_OBSERVE":

[ModelRunner]Add profile execute duration observation #1013

Are you sure you want to change the base?

[ModelRunner]Add profile execute duration observation #1013

Conversation

depeng1994 commented May 29, 2025

What this PR does / why we need it?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

MengqingCao commented May 30, 2025

Uh oh!

depeng1994 commented May 30, 2025

Uh oh!

github-actions bot commented Jun 3, 2025

Uh oh!

github-actions bot commented Jun 4, 2025

Uh oh!

github-actions bot commented Jun 4, 2025

Uh oh!

Yikun Jun 4, 2025

Choose a reason for hiding this comment

Uh oh!

depeng1994 Jun 5, 2025

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Jun 5, 2025

Uh oh!

Yikun left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Yikun Jun 5, 2025

Choose a reason for hiding this comment

Uh oh!

Yikun Jun 5, 2025

Choose a reason for hiding this comment

Uh oh!

Yikun commented Jun 5, 2025

Uh oh!

Uh oh!

Yikun left a comment •

edited

Loading