Skip to content

[CI] Refactor CI #952

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
May 27, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
78 changes: 21 additions & 57 deletions .github/workflows/vllm_ascend_test.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -30,32 +30,27 @@ on:
- '.github/workflows/vllm_ascend_test.yaml'
- '!docs/**'
- 'pytest.ini'

# Bash shells do not use ~/.profile or ~/.bashrc so these shells need to be explicitly
# declared as "shell: bash -el {0}" on steps that need to be properly activated.
# It's used to activate ascend-toolkit environment variables.
defaults:
run:
shell: bash -el {0}

concurrency:
group: pr-${{ github.event.pull_request.number }}
cancel-in-progress: true

jobs:
test:
strategy:
max-parallel: 2
matrix:
os: [linux-arm64-npu-1, linux-arm64-npu-4]
vllm_verison: [main, v0.8.5.post1]
vllm_version: [main, v0.8.5.post1]
concurrency:
group: >
${{
matrix.os == 'linux-arm64-npu-4'
&& github.event.pull_request.number
&& format('pr-{0}-limit-npu-4', github.event.pull_request.number)
|| format('job-{0}-{1}-{2}', matrix.os, matrix.vllm_verison, github.event.pull_request.number)
|| format('job-{0}-{1}-{2}', matrix.os, matrix.vllm_version, github.event.pull_request.number)
}}
cancel-in-progress: false
name: vLLM Ascend test
Expand All @@ -66,6 +61,7 @@ jobs:
env:
HF_ENDPOINT: https://hf-mirror.com
HF_TOKEN: ${{ secrets.HF_TOKEN }}
VLLM_LOGGING_LEVEL: ERROR
steps:
- name: Check npu and CANN info
run: |
Expand All @@ -92,7 +88,7 @@ jobs:
uses: actions/checkout@v4
with:
repository: vllm-project/vllm
ref: ${{ matrix.vllm_verison }}
ref: ${{ matrix.vllm_version }}
path: ./vllm-empty

- name: Install vllm-project/vllm from source
Expand All @@ -111,64 +107,32 @@ jobs:
VLLM_WORKER_MULTIPROC_METHOD: spawn
run: |
if [[ "${{ matrix.os }}" == "linux-arm64-npu-1" ]]; then
pytest -sv tests/singlecard/test_offline_inference.py
pytest -sv tests/singlecard/test_ilama_lora.py
pytest -sv tests/ops
pytest -sv tests/compile
VLLM_USE_MODELSCOPE=True pytest -sv tests/singlecard/test_offline_inference.py
# AscendScheduler doesn't work, fix it later
Copy link
Collaborator Author

@wangxiyuan wangxiyuan May 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be fixed by @zzzzwwjj in #939

# pytest -sv tests/singlecard/tets_schedule.py
# guided decoding doesn't work, fix it later
Copy link
Collaborator Author

@wangxiyuan wangxiyuan May 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be fixed by @shen-shanshan in #969

# pytest -sv tests/singlecard/test_guided_decoding.py.py
pytest -sv tests/singlecard/ --ignore=tests/singlecard/test_offline_inference.py --ignore=tests/singlecard/test_scheduler.py --ignore=tests/singlecard/test_guided_decoding.py
else
pytest -sv -k "QwQ" tests/multicard/test_offline_inference_distributed.py
pytest -sv tests/multicard/test_ilama_lora_tp2.py
pytest -sv tests/ops
pytest -sv tests/compile
VLLM_USE_MODELSCOPE=True pytest -sv tests/multicard/ --ignore=tests/multicard/test_ilama_lora_tp2.py
fi

- name: Run vllm-project/vllm-ascend test on V0 engine
env:
VLLM_USE_V1: 0
run: |
if [[ "${{ matrix.os }}" == "linux-arm64-npu-1" ]]; then
pytest -sv tests/singlecard/test_ilama_lora.py
pytest -sv tests/singlecard/test_offline_inference.py
pytest -sv tests/ops
VLLM_USE_MODELSCOPE=True pytest -sv tests/singlecard/test_offline_inference.py
# AscendScheduler doesn't work, fix it later
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto @zzzzwwjj

# pytest -sv tests/singlecard/tets_schedule.py
# guided decoding doesn't work, fix it later
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

# pytest -sv tests/singlecard/test_guided_decoding.py.py
pytest -sv tests/singlecard/ --ignore=tests/singlecard/test_offline_inference.py --ignore=tests/singlecard/test_scheduler.py --ignore=tests/singlecard/test_guided_decoding.py
else
pytest -sv tests/multicard/test_ilama_lora_tp2.py
pytest -sv -k "QwQ" tests/multicard/test_offline_inference_distributed.py
pytest -sv -k "DeepSeek" tests/multicard/test_offline_inference_distributed.py
pytest -sv tests/ops
fi

# only run test on spec decode when the related code changed
- name: Check for changes in Speculative Decode
if: github.event_name != 'schedule'
id: filter_spec_decode
uses: dorny/paths-filter@v3
with:
filters: |
speculative_tests_changed:
- ".github/workflows/vllm_ascend_test.yaml"
- "tests/singlecard/spec_decode/**"
- "tests/multicard/spec_decode_e2e/**"
- "vllm_ascend/worker/worker.py"
- "vllm_ascend/worker/model_runner.py"
- "vllm_ascend/worker/multi_step_runner.py"
- "vllm_ascend/worker/multi_step_worker.py"
- "vllm_ascend/worker/draft_model_runner.py"
- "vllm_ascend/patch/worker/patch_common/patch_metrics.py"
- "vllm_ascend/patch/worker/patch_common/patch_spec_decode_worker.py"
- "vllm_ascend/patch/worker/patch_common/patch_multi_step_worker.py"

- name: Run vllm-project/vllm-ascend Speculative Decode test
if: steps.filter_spec_decode.outputs.speculative_tests_changed == 'true' || github.event_name == 'schedule'
run: |
if [[ "${{ matrix.os }}" == "linux-arm64-npu-1" ]]; then
VLLM_USE_MODELSCOPE=true pytest -sv tests/singlecard/spec_decode/e2e/test_v1_spec_decode.py
pytest -sv tests/singlecard/spec_decode/e2e/test_mtp_correctness.py # it needs a clean process
pytest -sv tests/singlecard/spec_decode --ignore=tests/singlecard/spec_decode/e2e/test_mtp_correctness.py --ignore=tests/singlecard/spec_decode/e2e/test_v1_spec_decode.py
# Fixme: run VLLM_USE_MODELSCOPE=True pytest -sv tests/multicard/test_offline_inference_distributed.py will raise error.
VLLM_USE_MODELSCOPE=True pytest -sv tests/multicard/test_offline_inference_distributed.py::test_models_distributed_QwQ
VLLM_USE_MODELSCOPE=True pytest -sv tests/multicard/test_offline_inference_distributed.py::test_models_distributed_DeepSeek
VLLM_USE_MODELSCOPE=True pytest -sv tests/multicard/ --ignore=tests/multicard/test_ilama_lora_tp2.py --ignore=tests/multicard/test_offline_inference_distributed.py
fi

- name: Run vllm-project/vllm test for V0 Engine
env:
VLLM_USE_V1: 0
PYTORCH_NPU_ALLOC_CONF: max_split_size_mb:256
run: |
pytest -sv
98 changes: 98 additions & 0 deletions .github/workflows/vllm_ascend_test_long_term.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,98 @@
#
# Copyright (c) 2025 Huawei Technologies Co., Ltd. All Rights Reserved.
# This file is a part of the vllm-ascend project.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
name: 'e2e test / long-term-test'

on:
schedule:
# Runs at 23:00 UTC (7:00 AM Beijing) every day
- cron: '0 23 * * *'
pull_request:
types: [ labeled ]

# Bash shells do not use ~/.profile or ~/.bashrc so these shells need to be explicitly
# declared as "shell: bash -el {0}" on steps that need to be properly activated.
# It's used to activate ascend-toolkit environment variables.
defaults:
run:
shell: bash -el {0}

concurrency:
group: pr-${{ github.event.pull_request.number }}
cancel-in-progress: true

jobs:
long-term-test:
# long-term-test will be triggered when tag 'long-term-test' & 'ready-for-test' or schedule job
if: ${{ contains(github.event.pull_request.labels.*.name, 'long-term-test') && contains(github.event.pull_request.labels.*.name, 'ready-for-test') || github.event_name == 'schedule' }}
strategy:
max-parallel: 2
matrix:
vllm_version: [main, v0.8.5.post1]
name: vLLM Ascend long term test
runs-on: linux-arm64-npu-1
container:
# TODO(yikun): Remove m.daocloud.io prefix when infra proxy ready
image: m.daocloud.io/quay.io/ascend/cann:8.1.rc1-910b-ubuntu22.04-py3.10
env:
HF_ENDPOINT: https://hf-mirror.com
HF_TOKEN: ${{ secrets.HF_TOKEN }}
VLLM_LOGGING_LEVEL: ERROR
steps:
- name: Check npu and CANN info
run: |
npu-smi info
cat /usr/local/Ascend/ascend-toolkit/latest/"$(uname -i)"-linux/ascend_toolkit_install.info

- name: Config mirrors
run: |
sed -i 's|ports.ubuntu.com|mirrors.tuna.tsinghua.edu.cn|g' /etc/apt/sources.list
pip config set global.index-url https://mirrors.tuna.tsinghua.edu.cn/pypi/web/simple
apt-get update -y
apt install git -y
git config --global url."https://gh-proxy.test.osinfra.cn/https://github.com/".insteadOf https://github.com/

- name: Checkout vllm-project/vllm-ascend repo
uses: actions/checkout@v4

- name: Install system dependencies
run: |
apt-get -y install `cat packages.txt`
apt-get -y install gcc g++ cmake libnuma-dev

- name: Checkout vllm-project/vllm repo
uses: actions/checkout@v4
with:
repository: vllm-project/vllm
ref: ${{ matrix.vllm_version }}
path: ./vllm-empty

- name: Install vllm-project/vllm from source
working-directory: ./vllm-empty
run: |
VLLM_TARGET_DEVICE=empty pip install -e .

- name: Install vllm-project/vllm-ascend
run: |
pip install -r requirements-dev.txt
pip install -v -e .

- name: Run vllm-project/vllm-ascend long term test
run: |
# spec decode test
VLLM_USE_MODELSCOPE=true pytest -sv tests/long_term/spec_decode/e2e/test_v1_spec_decode.py
VLLM_USE_MODELSCOPE=True pytest -sv tests/long_term/spec_decode/e2e/test_mtp_correctness.py # it needs a clean process
pytest -sv tests/long_term/spec_decode --ignore=tests/long_term/spec_decode/e2e/test_mtp_correctness.py --ignore=tests/long_term/spec_decode/e2e/test_v1_spec_decode.py
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also cc @mengwei805 the spec decode will be triggered by manaully after this PR.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See the commit message:

1. decide which test need run, then label it. It can be `long-term-test` or `pd-test` or both.
2. add `ready-for-test` label, then the test will be ran.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i consider it is a wonderful refactor

13 changes: 9 additions & 4 deletions .github/workflows/vllm_ascend_test_pd.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -30,13 +30,18 @@ defaults:
run:
shell: bash -el {0}

concurrency:
group: pr-${{ github.event.pull_request.number }}
cancel-in-progress: true

jobs:
test:
if: ${{ github.event.label.name == 'module:pd' }}
prefilling-decoding-disaggregation:
# pd-test will be triggered when tag 'pd-test' & 'ready-for-test' or schedule job
if: ${{ contains(github.event.pull_request.labels.*.name, 'pd-test') && contains(github.event.pull_request.labels.*.name, 'ready-for-test') || github.event_name == 'schedule' }}
strategy:
matrix:
vllm_verison: [v0.8.5.post1]
name: vLLM Ascend test
vllm_verison: [main, v0.8.5.post1]
name: vLLM Ascend prefilling decoding disaggregation test
runs-on: linux-arm64-npu-static-8

container:
Expand Down
3 changes: 1 addition & 2 deletions format.sh
Original file line number Diff line number Diff line change
Expand Up @@ -272,9 +272,8 @@ echo 'vllm-ascend isort: Done'

# Clang-format section
# Exclude some files for formatting because they are vendored
# NOTE: Keep up to date with .github/workflows/clang-format.yml
CLANG_FORMAT_EXCLUDES=(
'csrc/kernels/pos_encoding_kernels.cpp'
'csrc/kernels/pos_encoding_kernels.cpp' 'csrc/kernels/advance_step.cpp' 'csrc/torch_binding.cpp' 'csrc/ops.h'
)

# Format specified files with clang-format
Expand Down
File renamed without changes.
Original file line number Diff line number Diff line change
Expand Up @@ -20,13 +20,10 @@
import shutil
from itertools import cycle
from pathlib import Path
from typing import List, Optional, Sequence, Tuple, Union
from typing import Optional, Sequence, Union

import pytest
import torch
from vllm import LLM, SamplingParams
from vllm.distributed import cleanup_dist_env_and_memory
from vllm.model_executor.utils import set_random_seed
from vllm import SamplingParams
from vllm.sequence import PromptLogprobs, SampleLogprobs

from ....model_utils import (TokensTextLogprobs,
Expand All @@ -45,65 +42,6 @@
]


@pytest.fixture
def test_llm_generator(common_llm_kwargs, per_test_common_llm_kwargs,
test_llm_kwargs, seed):

def generate():
kwargs = {
**common_llm_kwargs,
**per_test_common_llm_kwargs,
**test_llm_kwargs,
}

llm = LLM(**kwargs)

if seed is not None:
set_random_seed(seed)

yield llm

del llm
cleanup_dist_env_and_memory()

return generate


def maybe_assert_ngram_worker(llm):
# Verify the proposer worker is ngram if ngram is specified.
if (llm.llm_engine.speculative_config is not None
and llm.llm_engine.speculative_config.method == "ngram"):
from vllm.spec_decode.ngram_worker import NGramWorker
assert isinstance(
llm.llm_engine.model_executor.driver_worker.proposer_worker,
NGramWorker)


def get_output_from_llm_generator(
llm_generator, prompts,
sampling_params) -> Tuple[List[str], List[List[int]], float]:
tokens: List[str] = []
token_ids: List[List[int]] = []
acceptance_rate: float = -1.0
for llm in llm_generator():
maybe_assert_ngram_worker(llm)

outputs = llm.generate(prompts, sampling_params, use_tqdm=True)

token_ids = [output.outputs[0].token_ids for output in outputs]
tokens = [output.outputs[0].text for output in outputs]

# Fetch acceptance rate if logging is enabled.
if stat_loggers := getattr(llm.llm_engine, "stat_loggers", None):
stat_logger = stat_loggers["prometheus"]
acceptance_rate = (stat_logger.metrics.
gauge_spec_decode_draft_acceptance_rate.labels(
**stat_logger.labels)._value.get())
del llm

return tokens, token_ids, acceptance_rate


def check_logprobs_correctness(
spec_outputs: Sequence[Union[TokensTextLogprobs,
TokensTextLogprobsPromptLogprobs]],
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -41,9 +41,9 @@

import pytest

from tests.singlecard.spec_decode.e2e.conftest import \
from tests.long_term.spec_decode.e2e.conftest import \
run_equality_correctness_test
from tests.singlecard.spec_decode.utils import maybe_enable_chunked_prefill
from tests.long_term.spec_decode.utils import maybe_enable_chunked_prefill

# main model
# lmsys/vicuna-7b-v1.3 was to be used but it's causing
Expand Down Expand Up @@ -443,8 +443,3 @@ def test_mqa_scorer(vllm_runner, common_llm_kwargs, per_test_common_llm_kwargs,
max_output_len=output_len,
seed=seed,
temperature=0.0)


if __name__ == "__main__":
import pytest
pytest.main([__file__])
Original file line number Diff line number Diff line change
Expand Up @@ -41,9 +41,9 @@
from vllm.model_executor.layers.vocab_parallel_embedding import \
pad_vocab_size # noqa: F401

from tests.singlecard.spec_decode.e2e.conftest import \
from tests.long_term.spec_decode.e2e.conftest import \
run_equality_correctness_test
from tests.singlecard.spec_decode.utils import maybe_enable_chunked_prefill
from tests.long_term.spec_decode.utils import maybe_enable_chunked_prefill

# main model
MAIN_MODEL = "JackFram/llama-160m"
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -57,7 +57,6 @@

# precision
PRECISION = "bfloat16"
os.environ["VLLM_USE_MODELSCOPE"] = "True"


@pytest.mark.skipif(os.getenv("VLLM_USE_V1") == "1",
Expand Down Expand Up @@ -450,8 +449,3 @@ def test_mtp_disable_queue(vllm_runner, common_llm_kwargs,
per_test_common_llm_kwargs,
baseline_llm_kwargs, test_llm_kwargs,
batch_size, output_len, seed)


if __name__ == "__main__":
import pytest
pytest.main([__file__])
Loading