[bugfix] torch profiler bug for single gpu with GPUExecutor #8354

SolitaryThinker · 2024-09-11T05:01:04Z

GPUExecutor has a different API and does not define a _run_workers.

Another way to fix this would be define the _run_workers (it would only call the driver_worker) api in GPUExecutor to match the API of the other executors. That will avoid the extra import..

closes #8326 and closes #8351

github-actions · 2024-09-11T05:01:20Z

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can do one of these:

Add ready label to the PR
Enable auto-merge.

🚀

SolitaryThinker · 2024-09-11T05:04:13Z

/ready

njhill

Different executor class for non-async case

vllm/engine/llm_engine.py

SolitaryThinker · 2024-09-12T08:17:15Z

Nice catch @njhill! I'm going to use type() to match on GPUExecutor instead, as isinstance will also catch TP > 2 (multiproc) and only call start_profile on the driver_worker instead of all workers.

SolitaryThinker · 2024-09-12T08:17:56Z

/ready

vllm/engine/async_llm_engine.py

SolitaryThinker · 2024-09-13T00:13:52Z

would be good to get a force merge @simon-mo @youkaichao

DarkLight1337 · 2024-09-13T02:23:01Z

Can you investigate whether the failure in LLaVA test is related to your changes?

[2024-09-12T20:14:39Z] INFO 09-12 13:14:39 model_runner.py:997] Starting to load model llava-hf/llava-v1.6-mistral-7b-hf...
[2024-09-12T20:14:39Z] INFO 09-12 13:14:39 weight_utils.py:242] Using model weights format ['*.safetensors']
Loading safetensors checkpoint shards: 100% 4/4 [00:00<00:00, 37.70it/s]
[2024-09-12T20:14:42Z] INFO 09-12 13:14:42 model_runner.py:1008] Loading model weights took 14.0711 GB
[2024-09-12T20:14:42Z] WARNING 09-12 13:14:42 model_runner.py:1176] Computed max_num_seqs (min(256, 10240 // 11712)) to be less than 1. Setting it to the minimum value of 1.
[2024-09-12T20:16:16Z] FAILED

…ject#8354) Signed-off-by: Alvant <alvasian@yandex.ru>

…ject#8354) Signed-off-by: Amit Garg <mitgarg17495@gmail.com>

…ject#8354) Signed-off-by: LeiWang1999 <leiwang1999@outlook.com>

njhill reviewed Sep 11, 2024

View reviewed changes

vllm/engine/llm_engine.py Outdated Show resolved Hide resolved

vllm/engine/llm_engine.py Outdated Show resolved Hide resolved

SolitaryThinker added 2 commits September 12, 2024 01:39

[bugfix] torch profiler bug for single gpu with GPUExecutor

eaf0c01

change to type

e132dde

SolitaryThinker force-pushed the torch_profiler_1gpu_bug branch from ec866f3 to e132dde Compare September 12, 2024 08:40

format

b6c9c0f

comaniac added the ready ONLY add when PR is ready to merge/full CI is needed label Sep 12, 2024

comaniac approved these changes Sep 12, 2024

View reviewed changes

vllm/engine/async_llm_engine.py Outdated Show resolved Hide resolved

comments

6801e44

comaniac enabled auto-merge (squash) September 12, 2024 19:19

SolitaryThinker mentioned this pull request Sep 13, 2024

v0.6.1.post1 Release Tracker #8426

Closed

7 tasks

simon-mo disabled auto-merge September 13, 2024 04:29

simon-mo merged commit ba77527 into vllm-project:main Sep 13, 2024
48 of 52 checks passed

Alvant pushed a commit to compressa-ai/vllm that referenced this pull request Oct 26, 2024

[bugfix] torch profiler bug for single gpu with GPUExecutor (vllm-pro…

51d6351

…ject#8354) Signed-off-by: Alvant <alvasian@yandex.ru>

garg-amit pushed a commit to garg-amit/vllm that referenced this pull request Oct 28, 2024

[bugfix] torch profiler bug for single gpu with GPUExecutor (vllm-pro…

6d5f52f

…ject#8354) Signed-off-by: Amit Garg <mitgarg17495@gmail.com>

LeiWang1999 pushed a commit to LeiWang1999/vllm-bitblas that referenced this pull request Mar 26, 2025

[bugfix] torch profiler bug for single gpu with GPUExecutor (vllm-pro…

afaa863

…ject#8354) Signed-off-by: LeiWang1999 <leiwang1999@outlook.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[bugfix] torch profiler bug for single gpu with GPUExecutor #8354

[bugfix] torch profiler bug for single gpu with GPUExecutor #8354

Uh oh!

SolitaryThinker commented Sep 11, 2024 •

edited

Loading

Uh oh!

github-actions bot commented Sep 11, 2024

Uh oh!

SolitaryThinker commented Sep 11, 2024

Uh oh!

njhill left a comment

Uh oh!

Uh oh!

Uh oh!

SolitaryThinker commented Sep 12, 2024

Uh oh!

SolitaryThinker commented Sep 12, 2024

Uh oh!

Uh oh!

SolitaryThinker commented Sep 13, 2024

Uh oh!

DarkLight1337 commented Sep 13, 2024

Uh oh!

Uh oh!

Uh oh!

Uh oh!

[bugfix] torch profiler bug for single gpu with GPUExecutor #8354

[bugfix] torch profiler bug for single gpu with GPUExecutor #8354

Uh oh!

Conversation

SolitaryThinker commented Sep 11, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Sep 11, 2024

Uh oh!

SolitaryThinker commented Sep 11, 2024

Uh oh!

njhill left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

SolitaryThinker commented Sep 12, 2024

Uh oh!

SolitaryThinker commented Sep 12, 2024

Uh oh!

Uh oh!

SolitaryThinker commented Sep 13, 2024

Uh oh!

DarkLight1337 commented Sep 13, 2024

Uh oh!

Uh oh!

Uh oh!

SolitaryThinker commented Sep 11, 2024 •

edited

Loading