We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
There was an error while loading. Please reload this page.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
vllm serve /tmp/modelscope/hub/models/QwQ-32B-w8a8/ --served-model-name qwq-32b-w8a8 --host 0.0.0.0 --port 9007 -tp 4 --max-model-len 32768 --quantization ascend
提示不支持ascend。 镜像是:m.daocloud.io/quay.io/ascend/vllm-ascend:v0.8.5rc1。
usage: vllm serve [model_tag] [options] vllm serve: error: argument --quantization/-q: invalid choice: 'ascend' (choose from 'aqlm', 'awq', 'deepspeedfp', 'tpu_int8', 'fp8', 'ptpc_fp8', 'fbgemm_fp8', 'modelopt', 'nvfp4', 'marlin', 'bitblas', 'gguf', 'gptq_marlin_24', 'gptq_marlin', 'gptq_bitblas', 'awq_marlin', 'gptq', 'compressed-tensors', 'bitsandbytes', 'qqq', 'hqq', 'experts_int8', 'neuron_quant', 'ipex', 'quark', 'moe_wna16', 'torchao', None)
The text was updated successfully, but these errors were encountered:
online mode only works with --quantization ascend by this PR 00e0243
--quantization ascend
0.8.5 doesn't work. can you cherry-pick this PR by hand and try again?
Sorry, something went wrong.
No branches or pull requests
Your current environment
vllm serve /tmp/modelscope/hub/models/QwQ-32B-w8a8/ --served-model-name qwq-32b-w8a8 --host 0.0.0.0 --port 9007 -tp 4 --max-model-len 32768 --quantization ascend
提示不支持ascend。
镜像是:m.daocloud.io/quay.io/ascend/vllm-ascend:v0.8.5rc1。
🐛 Describe the bug
usage: vllm serve [model_tag] [options]
vllm serve: error: argument --quantization/-q: invalid choice: 'ascend' (choose from 'aqlm', 'awq', 'deepspeedfp', 'tpu_int8', 'fp8', 'ptpc_fp8', 'fbgemm_fp8', 'modelopt', 'nvfp4', 'marlin', 'bitblas', 'gguf', 'gptq_marlin_24', 'gptq_marlin', 'gptq_bitblas', 'awq_marlin', 'gptq', 'compressed-tensors', 'bitsandbytes', 'qqq', 'hqq', 'experts_int8', 'neuron_quant', 'ipex', 'quark', 'moe_wna16', 'torchao', None)
The text was updated successfully, but these errors were encountered: