You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
# note set `load-format=dummy`, for a lightweight test, we don't need real download weightsexport HF_ENDPOINT="https://hf-mirror.com"
python3 -m vllm.entrypoints.openai.api_server --model Qwen/Qwen2.5-7B-Instruct --tensor-parallel-size 1 --swap-space 16 --disable-log-stats --disable-log-requests --load-format dummy
Install necessary dependencies
pip config set global.index-url https://pypi.tuna.tsinghua.edu.cn/simple
pip install pandas datasets
Run benchmark for online serving
need wait for vllm serving ready
cd /vllm-workspace/vllm/benchmarks
python benchmark_serving.py --model Qwen/Qwen2.5-7B-Instruct --dataset-name random --random-input-len 200 --num-prompts 200 --request-rate 1 --save-result --result-dir ./
The text was updated successfully, but these errors were encountered:
Yikun
changed the title
[Usage]: How to quickly run a perf benchmark to determine if performance has improved
[Guide]: How to quickly run a perf benchmark to determine if performance has improved
May 15, 2025
Uh oh!
There was an error while loading. Please reload this page.
Your current environment
None
How would you like to use vllm on ascend
Assume you are using vllm-acend v0.7.3, and you want to know if the tuning strategy make sense, the following step may as a reference:
pip config set global.index-url https://pypi.tuna.tsinghua.edu.cn/simple pip install pandas datasets
need wait for vllm serving ready
cd /vllm-workspace/vllm/benchmarks python benchmark_serving.py --model Qwen/Qwen2.5-7B-Instruct --dataset-name random --random-input-len 200 --num-prompts 200 --request-rate 1 --save-result --result-dir ./
The text was updated successfully, but these errors were encountered: