-
Notifications
You must be signed in to change notification settings - Fork 39
unstable results of qwen-72b-instruct on IFEVAL? #476
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
nsamples = 512 |
I ran 4 experiments in one of my environments. Despite some randomness, all yielded reasonable results with W4G128.
|
another env
|
That's very interesting and very good news! This is with the HF backend? I usually ran vLLM since it is much faster. Maybe it makes a difference? |
I have verified your int4 model with vllm. The accuracy is close to these results. I am evaluating your int8 model. |
@benjamin-marie For int8 model, I could reproduce your result with vllm backend, HF backend is ok. I have opened an issue in lm-eval harness. EleutherAI/lm-evaluation-harness#2851 |
I wonder whether the problem is not with vLLM rather than with lm_eval. I'll do some more tests. |
HF torch 2.6 is fine, accuracy on torch 2.5 is low |
Uh oh!
There was an error while loading. Please reload this page.
The community has reported that we have unstable result of ifeval for Qwen2.5-72b-Instruct.
https://kaitchup.substack.com/p/a-comparison-of-5-quantization-methods
The interesting part is all of 3 recipes report satisfactory results.
@WeiweiZhang1 Could you have 5-10 runs with default parameters and tested the IFeval on with HF and vllm backend respectively. 4 bits first and then 8 bits
The text was updated successfully, but these errors were encountered: