You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Similar to #645, I am getting worse performance and throughput with the quantized version. I used the out of the box quantization example with the basic vLLM script. This is true for the 7B and 14B.
I am using vLLM and see roughly 1.8x slower throughput. When I run the benchmark script I see better performance with AWQ though:
Similar to #645, I am getting worse performance and throughput with the quantized version. I used the out of the box quantization example with the basic vLLM script. This is true for the 7B and 14B.
I am using vLLM and see roughly 1.8x slower throughput. When I run the benchmark script I see better performance with AWQ though:
Using the benchmarking script:
vs non-quantized:
installed are:
with
Am I using vLLM in a bad way / do I need other packages for AWQ to work?
The text was updated successfully, but these errors were encountered: