-
-
Notifications
You must be signed in to change notification settings - Fork 304
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[REQUEST] command-A? #750
Comments
Here's what happens during quantization measurement pass, in case it is helpful: Using exllamav2 0.2.8:
These errors happen on layers 62-63, otherwise things seem to go relatively smoothly. It is possible to force the quantization to continue by removing the inf/nan checks from the conversion script, and the resulting quant seems to run at least somewhat coherently with low context. It would be great if support for Command-A could be added, as it seems to be a really promising model, at least according to my tests. I am not sure how time consuming this would be, as the model seems to almost run already, or if this is something that @turboderp would rather not spend time on right now. |
Looks like the MLP overflows fp16. Not sure if the residual stream needs to be in fp32 or something. One workaround is to enable: I had to add another check in
so that adding 2 clamped values is also clamped. Still running through to make sure it works. |
I am afraid that clamping is not the solution. The model probably has some peculiarities that need special care? Or at least, this is the sort of output I am getting by just ignoring the overflows: temperature 1.0, min_p 0.1, 4.0 bpw, text completion:
Another with 20K tokens in context:
If I use llama.cpp with the same sampler settings and same prompts (or pretty much any really), the answers look noticeably improved. |
Problem
No response
Solution
People have posted some quants of command-A: https://huggingface.co/lynnea1517/c4ai-command-a-03-2025-exl2-4.5bpw-test or https://huggingface.co/models?search=command-a%20exl
They supposedly don't work great at long context due to missing support. Are the quants themselves likely fine or will they have to be redone upon the implementation being finished?
Alternatives
No response
Explanation
If they're truly broken-broken, maybe can save others some bandwith.
Examples
No response
Additional context
No response
Acknowledgements
The text was updated successfully, but these errors were encountered: