Skip to content

[Bug]: 不支持quantization为ascend的量化 #902

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
daiqifeng-sys opened this issue May 20, 2025 · 1 comment
Open

[Bug]: 不支持quantization为ascend的量化 #902

daiqifeng-sys opened this issue May 20, 2025 · 1 comment
Labels
bug Something isn't working

Comments

@daiqifeng-sys
Copy link

Your current environment

vllm serve /tmp/modelscope/hub/models/QwQ-32B-w8a8/ --served-model-name qwq-32b-w8a8 --host 0.0.0.0 --port 9007 -tp 4 --max-model-len 32768 --quantization ascend

提示不支持ascend。
镜像是:m.daocloud.io/quay.io/ascend/vllm-ascend:v0.8.5rc1。

🐛 Describe the bug

usage: vllm serve [model_tag] [options]
vllm serve: error: argument --quantization/-q: invalid choice: 'ascend' (choose from 'aqlm', 'awq', 'deepspeedfp', 'tpu_int8', 'fp8', 'ptpc_fp8', 'fbgemm_fp8', 'modelopt', 'nvfp4', 'marlin', 'bitblas', 'gguf', 'gptq_marlin_24', 'gptq_marlin', 'gptq_bitblas', 'awq_marlin', 'gptq', 'compressed-tensors', 'bitsandbytes', 'qqq', 'hqq', 'experts_int8', 'neuron_quant', 'ipex', 'quark', 'moe_wna16', 'torchao', None)

@daiqifeng-sys daiqifeng-sys added the bug Something isn't working label May 20, 2025
@wangxiyuan
Copy link
Collaborator

wangxiyuan commented May 20, 2025

online mode only works with --quantization ascend by this PR 00e0243

0.8.5 doesn't work. can you cherry-pick this PR by hand and try again?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants