Skip to content

Commit e2a0c19

Browse files
authored
[CI] Refactor CI (#952)
1. remove some useless test func and file 2. fix format.sh problem 3. enable full test for singlecard and multicard 4. move long term test to long_term folder. For this kind of test, it only runs by labeled and daily test. Include: spec decode、accuracy test ## After refactor: There are 4 test modules - `singlecard`: contains the test running on one NPU. It'll be run for each PR and daily test. - `multicard`: contains the test running on multi NPUs. It'll be run for each PR and daily test. - `long_term`: contains the test that cost much time(Now include `spec decode` and `accuracy` test). It'll be run for the PR with `long-term-test` labeled and daily test. - `e2e`: contains the test for doc and pd feature. It'll be run for the PR with `pd-test` labeled and daily test. ## Todo: 1. some test are skipped, they should be fixed and reenabled in the future. 2. pyhccl test for multicard doesn't work at all. It should be enabled as well. 3. ensure long-term-test pass by daily test. ### Know issue Now, `ready` labels is required to start pd test or long term test. And when `long-term-test` or `pd-test` is labeled after another one, the old labeled test will be re-run again. So the labeled test should be ran in the following step: 1. decide which test need run, then label it. `long-term-test` or `pd-test` or both. 2. add `ready-for-test` label, then the test will be ran. Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
1 parent 9f5ab59 commit e2a0c19

34 files changed

+171
-1288
lines changed

.github/workflows/vllm_ascend_test.yaml

Lines changed: 21 additions & 57 deletions
Original file line numberDiff line numberDiff line change
@@ -30,32 +30,27 @@ on:
3030
- '.github/workflows/vllm_ascend_test.yaml'
3131
- '!docs/**'
3232
- 'pytest.ini'
33-
3433
# Bash shells do not use ~/.profile or ~/.bashrc so these shells need to be explicitly
3534
# declared as "shell: bash -el {0}" on steps that need to be properly activated.
3635
# It's used to activate ascend-toolkit environment variables.
3736
defaults:
3837
run:
3938
shell: bash -el {0}
4039

41-
concurrency:
42-
group: pr-${{ github.event.pull_request.number }}
43-
cancel-in-progress: true
44-
4540
jobs:
4641
test:
4742
strategy:
4843
max-parallel: 2
4944
matrix:
5045
os: [linux-arm64-npu-1, linux-arm64-npu-4]
51-
vllm_verison: [main, v0.8.5.post1]
46+
vllm_version: [main, v0.8.5.post1]
5247
concurrency:
5348
group: >
5449
${{
5550
matrix.os == 'linux-arm64-npu-4'
5651
&& github.event.pull_request.number
5752
&& format('pr-{0}-limit-npu-4', github.event.pull_request.number)
58-
|| format('job-{0}-{1}-{2}', matrix.os, matrix.vllm_verison, github.event.pull_request.number)
53+
|| format('job-{0}-{1}-{2}', matrix.os, matrix.vllm_version, github.event.pull_request.number)
5954
}}
6055
cancel-in-progress: false
6156
name: vLLM Ascend test
@@ -66,6 +61,7 @@ jobs:
6661
env:
6762
HF_ENDPOINT: https://hf-mirror.com
6863
HF_TOKEN: ${{ secrets.HF_TOKEN }}
64+
VLLM_LOGGING_LEVEL: ERROR
6965
steps:
7066
- name: Check npu and CANN info
7167
run: |
@@ -92,7 +88,7 @@ jobs:
9288
uses: actions/checkout@v4
9389
with:
9490
repository: vllm-project/vllm
95-
ref: ${{ matrix.vllm_verison }}
91+
ref: ${{ matrix.vllm_version }}
9692
path: ./vllm-empty
9793

9894
- name: Install vllm-project/vllm from source
@@ -111,64 +107,32 @@ jobs:
111107
VLLM_WORKER_MULTIPROC_METHOD: spawn
112108
run: |
113109
if [[ "${{ matrix.os }}" == "linux-arm64-npu-1" ]]; then
114-
pytest -sv tests/singlecard/test_offline_inference.py
115-
pytest -sv tests/singlecard/test_ilama_lora.py
116-
pytest -sv tests/ops
117-
pytest -sv tests/compile
110+
VLLM_USE_MODELSCOPE=True pytest -sv tests/singlecard/test_offline_inference.py
111+
# AscendScheduler doesn't work, fix it later
112+
# pytest -sv tests/singlecard/tets_schedule.py
113+
# guided decoding doesn't work, fix it later
114+
# pytest -sv tests/singlecard/test_guided_decoding.py.py
115+
pytest -sv tests/singlecard/ --ignore=tests/singlecard/test_offline_inference.py --ignore=tests/singlecard/test_scheduler.py --ignore=tests/singlecard/test_guided_decoding.py
118116
else
119-
pytest -sv -k "QwQ" tests/multicard/test_offline_inference_distributed.py
120117
pytest -sv tests/multicard/test_ilama_lora_tp2.py
121-
pytest -sv tests/ops
122-
pytest -sv tests/compile
118+
VLLM_USE_MODELSCOPE=True pytest -sv tests/multicard/ --ignore=tests/multicard/test_ilama_lora_tp2.py
123119
fi
124120
125121
- name: Run vllm-project/vllm-ascend test on V0 engine
126122
env:
127123
VLLM_USE_V1: 0
128124
run: |
129125
if [[ "${{ matrix.os }}" == "linux-arm64-npu-1" ]]; then
130-
pytest -sv tests/singlecard/test_ilama_lora.py
131-
pytest -sv tests/singlecard/test_offline_inference.py
132-
pytest -sv tests/ops
126+
VLLM_USE_MODELSCOPE=True pytest -sv tests/singlecard/test_offline_inference.py
127+
# AscendScheduler doesn't work, fix it later
128+
# pytest -sv tests/singlecard/tets_schedule.py
129+
# guided decoding doesn't work, fix it later
130+
# pytest -sv tests/singlecard/test_guided_decoding.py.py
131+
pytest -sv tests/singlecard/ --ignore=tests/singlecard/test_offline_inference.py --ignore=tests/singlecard/test_scheduler.py --ignore=tests/singlecard/test_guided_decoding.py
133132
else
134133
pytest -sv tests/multicard/test_ilama_lora_tp2.py
135-
pytest -sv -k "QwQ" tests/multicard/test_offline_inference_distributed.py
136-
pytest -sv -k "DeepSeek" tests/multicard/test_offline_inference_distributed.py
137-
pytest -sv tests/ops
138-
fi
139-
140-
# only run test on spec decode when the related code changed
141-
- name: Check for changes in Speculative Decode
142-
if: github.event_name != 'schedule'
143-
id: filter_spec_decode
144-
uses: dorny/paths-filter@v3
145-
with:
146-
filters: |
147-
speculative_tests_changed:
148-
- ".github/workflows/vllm_ascend_test.yaml"
149-
- "tests/singlecard/spec_decode/**"
150-
- "tests/multicard/spec_decode_e2e/**"
151-
- "vllm_ascend/worker/worker.py"
152-
- "vllm_ascend/worker/model_runner.py"
153-
- "vllm_ascend/worker/multi_step_runner.py"
154-
- "vllm_ascend/worker/multi_step_worker.py"
155-
- "vllm_ascend/worker/draft_model_runner.py"
156-
- "vllm_ascend/patch/worker/patch_common/patch_metrics.py"
157-
- "vllm_ascend/patch/worker/patch_common/patch_spec_decode_worker.py"
158-
- "vllm_ascend/patch/worker/patch_common/patch_multi_step_worker.py"
159-
160-
- name: Run vllm-project/vllm-ascend Speculative Decode test
161-
if: steps.filter_spec_decode.outputs.speculative_tests_changed == 'true' || github.event_name == 'schedule'
162-
run: |
163-
if [[ "${{ matrix.os }}" == "linux-arm64-npu-1" ]]; then
164-
VLLM_USE_MODELSCOPE=true pytest -sv tests/singlecard/spec_decode/e2e/test_v1_spec_decode.py
165-
pytest -sv tests/singlecard/spec_decode/e2e/test_mtp_correctness.py # it needs a clean process
166-
pytest -sv tests/singlecard/spec_decode --ignore=tests/singlecard/spec_decode/e2e/test_mtp_correctness.py --ignore=tests/singlecard/spec_decode/e2e/test_v1_spec_decode.py
134+
# Fixme: run VLLM_USE_MODELSCOPE=True pytest -sv tests/multicard/test_offline_inference_distributed.py will raise error.
135+
VLLM_USE_MODELSCOPE=True pytest -sv tests/multicard/test_offline_inference_distributed.py::test_models_distributed_QwQ
136+
VLLM_USE_MODELSCOPE=True pytest -sv tests/multicard/test_offline_inference_distributed.py::test_models_distributed_DeepSeek
137+
VLLM_USE_MODELSCOPE=True pytest -sv tests/multicard/ --ignore=tests/multicard/test_ilama_lora_tp2.py --ignore=tests/multicard/test_offline_inference_distributed.py
167138
fi
168-
169-
- name: Run vllm-project/vllm test for V0 Engine
170-
env:
171-
VLLM_USE_V1: 0
172-
PYTORCH_NPU_ALLOC_CONF: max_split_size_mb:256
173-
run: |
174-
pytest -sv
Lines changed: 98 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,98 @@
1+
#
2+
# Copyright (c) 2025 Huawei Technologies Co., Ltd. All Rights Reserved.
3+
# This file is a part of the vllm-ascend project.
4+
#
5+
# Licensed under the Apache License, Version 2.0 (the "License");
6+
# you may not use this file except in compliance with the License.
7+
# You may obtain a copy of the License at
8+
#
9+
# http://www.apache.org/licenses/LICENSE-2.0
10+
#
11+
# Unless required by applicable law or agreed to in writing, software
12+
# distributed under the License is distributed on an "AS IS" BASIS,
13+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
14+
# See the License for the specific language governing permissions and
15+
# limitations under the License.
16+
#
17+
name: 'e2e test / long-term-test'
18+
19+
on:
20+
schedule:
21+
# Runs at 23:00 UTC (7:00 AM Beijing) every day
22+
- cron: '0 23 * * *'
23+
pull_request:
24+
types: [ labeled ]
25+
26+
# Bash shells do not use ~/.profile or ~/.bashrc so these shells need to be explicitly
27+
# declared as "shell: bash -el {0}" on steps that need to be properly activated.
28+
# It's used to activate ascend-toolkit environment variables.
29+
defaults:
30+
run:
31+
shell: bash -el {0}
32+
33+
concurrency:
34+
group: pr-${{ github.event.pull_request.number }}
35+
cancel-in-progress: true
36+
37+
jobs:
38+
long-term-test:
39+
# long-term-test will be triggered when tag 'long-term-test' & 'ready-for-test' or schedule job
40+
if: ${{ contains(github.event.pull_request.labels.*.name, 'long-term-test') && contains(github.event.pull_request.labels.*.name, 'ready-for-test') || github.event_name == 'schedule' }}
41+
strategy:
42+
max-parallel: 2
43+
matrix:
44+
vllm_version: [main, v0.8.5.post1]
45+
name: vLLM Ascend long term test
46+
runs-on: linux-arm64-npu-1
47+
container:
48+
# TODO(yikun): Remove m.daocloud.io prefix when infra proxy ready
49+
image: m.daocloud.io/quay.io/ascend/cann:8.1.rc1-910b-ubuntu22.04-py3.10
50+
env:
51+
HF_ENDPOINT: https://hf-mirror.com
52+
HF_TOKEN: ${{ secrets.HF_TOKEN }}
53+
VLLM_LOGGING_LEVEL: ERROR
54+
steps:
55+
- name: Check npu and CANN info
56+
run: |
57+
npu-smi info
58+
cat /usr/local/Ascend/ascend-toolkit/latest/"$(uname -i)"-linux/ascend_toolkit_install.info
59+
60+
- name: Config mirrors
61+
run: |
62+
sed -i 's|ports.ubuntu.com|mirrors.tuna.tsinghua.edu.cn|g' /etc/apt/sources.list
63+
pip config set global.index-url https://mirrors.tuna.tsinghua.edu.cn/pypi/web/simple
64+
apt-get update -y
65+
apt install git -y
66+
git config --global url."https://gh-proxy.test.osinfra.cn/https://github.com/".insteadOf https://github.com/
67+
68+
- name: Checkout vllm-project/vllm-ascend repo
69+
uses: actions/checkout@v4
70+
71+
- name: Install system dependencies
72+
run: |
73+
apt-get -y install `cat packages.txt`
74+
apt-get -y install gcc g++ cmake libnuma-dev
75+
76+
- name: Checkout vllm-project/vllm repo
77+
uses: actions/checkout@v4
78+
with:
79+
repository: vllm-project/vllm
80+
ref: ${{ matrix.vllm_version }}
81+
path: ./vllm-empty
82+
83+
- name: Install vllm-project/vllm from source
84+
working-directory: ./vllm-empty
85+
run: |
86+
VLLM_TARGET_DEVICE=empty pip install -e .
87+
88+
- name: Install vllm-project/vllm-ascend
89+
run: |
90+
pip install -r requirements-dev.txt
91+
pip install -v -e .
92+
93+
- name: Run vllm-project/vllm-ascend long term test
94+
run: |
95+
# spec decode test
96+
VLLM_USE_MODELSCOPE=true pytest -sv tests/long_term/spec_decode/e2e/test_v1_spec_decode.py
97+
VLLM_USE_MODELSCOPE=True pytest -sv tests/long_term/spec_decode/e2e/test_mtp_correctness.py # it needs a clean process
98+
pytest -sv tests/long_term/spec_decode --ignore=tests/long_term/spec_decode/e2e/test_mtp_correctness.py --ignore=tests/long_term/spec_decode/e2e/test_v1_spec_decode.py

.github/workflows/vllm_ascend_test_pd.yaml

Lines changed: 9 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -30,13 +30,18 @@ defaults:
3030
run:
3131
shell: bash -el {0}
3232

33+
concurrency:
34+
group: pr-${{ github.event.pull_request.number }}
35+
cancel-in-progress: true
36+
3337
jobs:
34-
test:
35-
if: ${{ github.event.label.name == 'module:pd' }}
38+
prefilling-decoding-disaggregation:
39+
# pd-test will be triggered when tag 'pd-test' & 'ready-for-test' or schedule job
40+
if: ${{ contains(github.event.pull_request.labels.*.name, 'pd-test') && contains(github.event.pull_request.labels.*.name, 'ready-for-test') || github.event_name == 'schedule' }}
3641
strategy:
3742
matrix:
38-
vllm_verison: [v0.8.5.post1]
39-
name: vLLM Ascend test
43+
vllm_verison: [main, v0.8.5.post1]
44+
name: vLLM Ascend prefilling decoding disaggregation test
4045
runs-on: linux-arm64-npu-static-8
4146

4247
container:

format.sh

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -272,9 +272,8 @@ echo 'vllm-ascend isort: Done'
272272

273273
# Clang-format section
274274
# Exclude some files for formatting because they are vendored
275-
# NOTE: Keep up to date with .github/workflows/clang-format.yml
276275
CLANG_FORMAT_EXCLUDES=(
277-
'csrc/kernels/pos_encoding_kernels.cpp'
276+
'csrc/kernels/pos_encoding_kernels.cpp' 'csrc/kernels/advance_step.cpp' 'csrc/torch_binding.cpp' 'csrc/ops.h'
278277
)
279278

280279
# Format specified files with clang-format

tests/singlecard/spec_decode/e2e/conftest.py renamed to tests/long_term/spec_decode/e2e/conftest.py

Lines changed: 2 additions & 64 deletions
Original file line numberDiff line numberDiff line change
@@ -20,13 +20,10 @@
2020
import shutil
2121
from itertools import cycle
2222
from pathlib import Path
23-
from typing import List, Optional, Sequence, Tuple, Union
23+
from typing import Optional, Sequence, Union
2424

25-
import pytest
2625
import torch
27-
from vllm import LLM, SamplingParams
28-
from vllm.distributed import cleanup_dist_env_and_memory
29-
from vllm.model_executor.utils import set_random_seed
26+
from vllm import SamplingParams
3027
from vllm.sequence import PromptLogprobs, SampleLogprobs
3128

3229
from ....model_utils import (TokensTextLogprobs,
@@ -45,65 +42,6 @@
4542
]
4643

4744

48-
@pytest.fixture
49-
def test_llm_generator(common_llm_kwargs, per_test_common_llm_kwargs,
50-
test_llm_kwargs, seed):
51-
52-
def generate():
53-
kwargs = {
54-
**common_llm_kwargs,
55-
**per_test_common_llm_kwargs,
56-
**test_llm_kwargs,
57-
}
58-
59-
llm = LLM(**kwargs)
60-
61-
if seed is not None:
62-
set_random_seed(seed)
63-
64-
yield llm
65-
66-
del llm
67-
cleanup_dist_env_and_memory()
68-
69-
return generate
70-
71-
72-
def maybe_assert_ngram_worker(llm):
73-
# Verify the proposer worker is ngram if ngram is specified.
74-
if (llm.llm_engine.speculative_config is not None
75-
and llm.llm_engine.speculative_config.method == "ngram"):
76-
from vllm.spec_decode.ngram_worker import NGramWorker
77-
assert isinstance(
78-
llm.llm_engine.model_executor.driver_worker.proposer_worker,
79-
NGramWorker)
80-
81-
82-
def get_output_from_llm_generator(
83-
llm_generator, prompts,
84-
sampling_params) -> Tuple[List[str], List[List[int]], float]:
85-
tokens: List[str] = []
86-
token_ids: List[List[int]] = []
87-
acceptance_rate: float = -1.0
88-
for llm in llm_generator():
89-
maybe_assert_ngram_worker(llm)
90-
91-
outputs = llm.generate(prompts, sampling_params, use_tqdm=True)
92-
93-
token_ids = [output.outputs[0].token_ids for output in outputs]
94-
tokens = [output.outputs[0].text for output in outputs]
95-
96-
# Fetch acceptance rate if logging is enabled.
97-
if stat_loggers := getattr(llm.llm_engine, "stat_loggers", None):
98-
stat_logger = stat_loggers["prometheus"]
99-
acceptance_rate = (stat_logger.metrics.
100-
gauge_spec_decode_draft_acceptance_rate.labels(
101-
**stat_logger.labels)._value.get())
102-
del llm
103-
104-
return tokens, token_ids, acceptance_rate
105-
106-
10745
def check_logprobs_correctness(
10846
spec_outputs: Sequence[Union[TokensTextLogprobs,
10947
TokensTextLogprobsPromptLogprobs]],

tests/singlecard/spec_decode/e2e/test_medusa_correctness.py renamed to tests/long_term/spec_decode/e2e/test_medusa_correctness.py

Lines changed: 2 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -41,9 +41,9 @@
4141

4242
import pytest
4343

44-
from tests.singlecard.spec_decode.e2e.conftest import \
44+
from tests.long_term.spec_decode.e2e.conftest import \
4545
run_equality_correctness_test
46-
from tests.singlecard.spec_decode.utils import maybe_enable_chunked_prefill
46+
from tests.long_term.spec_decode.utils import maybe_enable_chunked_prefill
4747

4848
# main model
4949
# lmsys/vicuna-7b-v1.3 was to be used but it's causing
@@ -443,8 +443,3 @@ def test_mqa_scorer(vllm_runner, common_llm_kwargs, per_test_common_llm_kwargs,
443443
max_output_len=output_len,
444444
seed=seed,
445445
temperature=0.0)
446-
447-
448-
if __name__ == "__main__":
449-
import pytest
450-
pytest.main([__file__])

tests/singlecard/spec_decode/e2e/test_mlp_correctness.py renamed to tests/long_term/spec_decode/e2e/test_mlp_correctness.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -41,9 +41,9 @@
4141
from vllm.model_executor.layers.vocab_parallel_embedding import \
4242
pad_vocab_size # noqa: F401
4343

44-
from tests.singlecard.spec_decode.e2e.conftest import \
44+
from tests.long_term.spec_decode.e2e.conftest import \
4545
run_equality_correctness_test
46-
from tests.singlecard.spec_decode.utils import maybe_enable_chunked_prefill
46+
from tests.long_term.spec_decode.utils import maybe_enable_chunked_prefill
4747

4848
# main model
4949
MAIN_MODEL = "JackFram/llama-160m"

tests/singlecard/spec_decode/e2e/test_mtp_correctness.py renamed to tests/long_term/spec_decode/e2e/test_mtp_correctness.py

Lines changed: 0 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -57,7 +57,6 @@
5757

5858
# precision
5959
PRECISION = "bfloat16"
60-
os.environ["VLLM_USE_MODELSCOPE"] = "True"
6160

6261

6362
@pytest.mark.skipif(os.getenv("VLLM_USE_V1") == "1",
@@ -450,8 +449,3 @@ def test_mtp_disable_queue(vllm_runner, common_llm_kwargs,
450449
per_test_common_llm_kwargs,
451450
baseline_llm_kwargs, test_llm_kwargs,
452451
batch_size, output_len, seed)
453-
454-
455-
if __name__ == "__main__":
456-
import pytest
457-
pytest.main([__file__])

0 commit comments

Comments
 (0)