-
Notifications
You must be signed in to change notification settings - Fork 525
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
mergekit-extract-lora Mergekit 0.1.0 breaks/broken . #521
Comments
Nope ; I reverted to Mergekit 0.0.5.2 (dec 10 2024) ; re-did the LORAs + Merge Yaml => Quants. Something is wrong with the way the LORAs are build in 0.1.0 Specifics: models:
(also did one via Pass-through, it also works no issue) I have used this Yaml with different models/archs (and the LORAs extracted) from them without issue until Mergekit 0.1.0 This is based on "--save module lm_head" working sometimes (Llama 3.2 3b, but not always with Llama 3.1 8b ??) , and solving some of the LORA + Merge issues. IMPORTANT NOTE: ADDED NOTES: IE: Word repeats occur in generation, which are part of the "LORA" rank/layer capture . In this case the LORA transfer appears correct - the model attaining "reasoning/thinking" but the actual reasoning "vocab" appears corrupt or misaligned. IE: it repeats words/phrases for "thinking/reasoning" 4 or 5... 10 times, whereas it should be a single word/phrase. Pre V 0.1.0 Adapter VS Post V0.1.0 There is a mismatch in both target modules and "saved" modules. It appears pre- V0.1.0 is saving more models, and 1 additional target module: https://huggingface.co/kromcomp/L3.1-Prev0.1-Control-Nanuq-r128-LoRA/blob/main/adapter_config.json https://huggingface.co/kromcomp/L3.1-Postv0.1-Control-Nanuq-r128-LoRA/blob/main/adapter_config.json Either not enough modules are being saved by default in newest version and/or one of the "target_modules" is missing may be root / part root of the issue. Also -> Convert to Float 32 ... then back again - this might add further issues / unknown issues as well with rounding errors. I have been discussing issues related to V 0.1.0 with another colleague having the same issues with new version. |
Thank you for the thorough report! This was very helpful to track down what was going on. I think I've fixed this in #522 - for the specific model/base combo here the results are now equal up to floating point precision with v0.0.6 (with the |
lmao I took so long to write my first comment that @cg123 you've already put out a possible fix. Well, I'll make this next bit as succinct as I can manage so I'm not wasting time. With the LoRA extractor issues addressed, the second part is the actual application of LoRAs on models that also seem to have differences between pre and post v0.1 versions. For this part of my experiment, I took the two LoRAs created in the pre- and post-extractor and did a total of four merges: two in pre v0.1 mergekit and two in post v0.1 mergekit. I used the same base model for all four merges and ran the following config without any optional flags save for models:
- model: AGI-0/smartllama3.1-8B-001+kromcomp/L3.1-Prev0.1-Control-Nanuq-r128-LoRA
parameters:
int8_mask: true
merge_method: passthrough
dtype: bfloat16
name: prev0.1
---
models:
- model: AGI-0/smartllama3.1-8B-001+kromcomp/L3.1-Postv0.1-Control-Nanuq-r128-LoRA
parameters:
int8_mask: true
merge_method: passthrough
dtype: bfloat16
name: postv0.1 The results are the following four models (already quanted at q_8 on the latest llama.cpp): https://huggingface.co/kromquant/trial-Post-to-Post-GGUFs Models are named as such to document what version of LoRA was used in what version of mergekit. Pre and Post are referring to before and after v0.1. So model 'Trial Pre to Post' is made with Pre v0.1 LoRA in Post v0.1 mergekit. I'll be short-handing them to pre/pre or post/post for efficiency. The expectation would be for the same LoRA plus model to result in the same merge regardless of mergekit version. However, that's not what I've been seeing in testing. I've kept my testing simple; same samplers, same context, and no model instructions. In short, post v0.1 mergekit seems to be missing or applying a value differently to result in slight changes. For Pre/Pre vs. Pre/Post, both models are relatively logistically sound considering the nature of the base model and respective LoRA. Pre/Pre feels more rigid in a sense, closer to the structure of the base model compared to Pre/Post which has more fluidity similar to the finetune used to make the LoRA. That said, Pre/Post tends to hallucinate more than Pre/Pre and be more forgetful about existing context. For Post/Pre vs. Post/Post, they're both dumber than the other two simply because Outside this experiment, there are other points to note. These models came out usable because the two LoRAs were pulled off the correct base and finetune. LoRAs pulled with mismatched bases/finetunes are corrupted and that usually expresses itself by repeating a few choppy words near the end of generation in pre v0.1 extractor/mergekit. Post v0.1 corruptions tend to manifest in endless adjective word vomit and the tolerance for corruption is smaller. |
David and I suspect the major issue is probably the lack of With that, I'd like to propose an idea to entertain about possible quality of life changes to the LoRA extractor. Because of all the options and settings that exist, it might be worthwhile to change how the command I've made a proof of concept config.yaml just to visualize what it could look like. Of course, final say is up to you. finetune: Delta-Vector/Control-Nanuq-8B
base_model: arcee-ai/Llama-3.1-SuperNova-Lite
rank: 128
alpha: 128
epsilon: 0
targeted_modules:
filter: self_attn, mlp, lm_head
saved_modules:
filter: input_layernorm, embed_tokens, norm, post_attention_layernorm |
So I've tested the possible fix PR and it did solve most issues. There's no runtime error for incompatible floats and the entire Just one problem, it seems like the fix is hard-coded to include The new merge still suffers from the post v0.1 mergekit 'jank' and the loracache is still being saved at half precision but, the PR wasn't supposed to fix that so all is well otherwise. |
@kromeurus Thank you for all of this writeup, and for the rigorous experimentation! It's very helpful to know where things are falling short and how things could be more streamlined. I'm definitely open to the idea of a YAML config for lora extraction. The full set of options is getting pretty unwieldy. To clarify - by the post v0.1 'jank' are you meaning the results from LoRAs that were extracted pre-PR not saving I've pushed some changes to that branch that I think address everything that's been brought up. |
@cg123 No problem, and thank you for such a quick response. Glad that my informal analysis could help. A YAML config would be much appreciated, please. The entire command line goes beyond my screen. RE Post v0.1 'jank': It's hard to explain, especially since it's such a slight difference. It doesn't seem to be related to While saving I'm not sure what mergekit uses to meld models and LoRAs together. But it might be better to have mergekit be able to read and build with those adapters, compared to just saving the entire module as the sole remedy. This way, mergekit can utilize other LoRAs compiled in other software too. The other option could be changing the new extractor to not split Again, thank you for your work and diligence. I'll get back to you on the new fixes and details for the 'jank' once I'm able to. |
@cg123 Aight I'm back with testing finished. Here are the results. For PR fixes: Almost everything has been addressed but a few new problems popped up. I pulled this LoRA (same models as previous versions in this thread) with the new PR and flag
Also, thank you for the If a YAML config input does get implemented, you could safely remove half the flags since they'd be declared in one way or another. Adding new options should also be easier though ymmv. Additionally, I recommend changing the default setting for As for the aforementioned 'jank' in post v0.1 mergekit; it's not worth pursuing a possible fix. I've been trying to express it in an easily identifiable way but I can't. This might just be a tiny detail that only those who've spent months exclusively using mergekit would spot (ie. David and I lol). For the average practical user, it shouldn't matter as everything else is still working as intended. If we're able to quantifiably find and replicate this, we'll open another issue for that when it comes. Again, thank you for all your work and help. Mergekit is a fantastic toolkit to build with. I've been using it for over a year and I'm still learning new things. David would agree with me. I'd like to ask if I could possibly collaborate with you on making a proper guide of sorts on how to use mergekit. There's been an uptick in new mergers in the AI space, but the resources to understand mergekit are few and far between. Not to mention how technical those existing resources are. Would you be interested in a practical documentation of mergekit with modern examples? |
Second the motion: I'd like to ask if I could possibly collaborate with you on making a proper guide of sorts on how to use mergekit. There's been an uptick in new mergers in the AI space, but the resources to understand mergekit are few and far between. Not to mention how technical those existing resources are. Would you be interested in a practical documentation of mergekit with modern examples?
PS: |
Okay, think it should actually be good to go now! Having more and better documentation would be fantastic, that's definitely a weak point. I'd be very interested in collaborating with you both on something like this. |
Many thanks, |
Addresses #521. Also adds: * `--lora-merge-dtype` to specify dtype to use when applying LoRA adapters to models * `--gpu-rich` alias for convenience * Organize display of options in `--help`
Issue when merging with a LORA created THEN adding this LORA with "mergekit-yaml" [updated Mergekit Feb 20 2025]:
C:\Program Files\Python312\Lib\site-packages\peft\peft_model.py:566: UserWarning: Found missing adapter keys while loading the checkpoint: ['base_model.model.lm_head.lora_A.default.weight', 'base_model.model.lm_head.lora_B.default.weight']
LORA:
mergekit-extract-lora --model=D:/DeepSeek-R1-Distill-Llama-8B --base-model=G:/7b/Meta-Llama-3.1-8B-Instruct --out-path=E:/Base-DeepLora8b-BASE-L3.1-LORA-32 --no-lazy-unpickle --max-rank=32 --cuda
NOTE: This pull is active in my install - may add to issue (??)
#510
MERGEKIT:
mergekit-yaml --lora-merge-cache E:\huggingface_cache --copy-tokenizer --allow-crimes --cuda --out-shard-size 5B --lazy-unpickle --clone-tensors f:/mergefiles/L3-Llama_inst-Deepseek-Lora-BASED32-org.txt E:/L3-Llama_instruct-Deepseek-Lora-BASE-org
Model does work , but not correctly.
I tried "--save module lm_head" ; sometimes this works (???)
Prior LORAs (with older version of mergekit "mergekit-extract-lora" ) DO work correctly.
IE:
mergekit-extract-lora D:/8b-deepseek G:/7b/Meta-Llama-3.1-8B-Instruct E:/DeepLora8bL3.1 --no-lazy-unpickle --rank=32
The text was updated successfully, but these errors were encountered: