Replies: 4 comments 1 reply
-
Considering that all ggml model versions ever released are currently supported by llama.cpp, I don't think that it is very fair to talk about "constant breaking changes".
But that's the thing, we are nowhere near that point. If Q4_2 proves to be better than Q4_0 in every meaningful aspect, I would expect that eventually we will remove support for Q4_0. Maintaining old code has a cost and at this point it wouldn't make a lot of sense to slow down development because of that. If you want to live on the bleeding edge pulling daily from master, breaking changes should be expected. Would it be better if we tagged a 0.1 release and directed people to use that instead? |
Beta Was this translation helpful? Give feedback.
-
Personally nah. If something breaks, one can always fetch the previous commit until they got reason to move onto newer one. Or if you're like me then just mish and mash the commits you need |
Beta Was this translation helpful? Give feedback.
-
Nope, and I am expecting for how q4_2 version 65b works. |
Beta Was this translation helpful? Give feedback.
-
A breacking change should always be documented in the commits or best labeled in the PR. Now that Stable.ai announced to have compatibility with llama.cpp of their new released StableLM in the future, it is much more important to have a structured collaboration of all contributors. |
Beta Was this translation helpful? Give feedback.
-
I wanted to write a comment in #1026 but maybe I'm alone so I want to know how other people feel about it. (Edit: just as I wrote this the PR removed
breaking changes
tag and now considersQ4_2
)It seems like there is a nice ecosystem of models forming around this project but it's getting ravaged by repeated file format changes. At least before we had scripts to migrate them to new format but superseding one
Q4_0
with another will completely turn quantized weights into pumpkin and no script will fix that, only repeating quantization from source files. That requires huge downloads of original files (>20 Gb if author was charitable to release .bin files in fp16, otherwise double that number). Not to mention pollution of the internet with no longer functional files. Unless you know the 'expiration date' on ggml files introduced by each commit, it's a Russian Roulette for the end user.I know it's annoying to support growing legacy of formats but at some point the project must turn mature and start handling that.
Beta Was this translation helpful? Give feedback.
All reactions