-
Notifications
You must be signed in to change notification settings - Fork 523
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Inter-architecture weights mapping #165
Conversation
35a7963
to
86772ed
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for getting to those changes! Looking pretty good now - I had a few more thoughts before we get this merged.
@@ -23,6 +23,7 @@ dependencies = [ | |||
"typing-extensions", | |||
"sentencepiece", | |||
"protobuf", | |||
"scipy" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
whoops, sorry for missing that dependency!
mergekit/architecture.py
Outdated
def set_overrides(self, overrides: Dict[str, str]) -> "ConfiguredArchitectureInfo": | ||
# NOTE: this makes sure template strings in overrides are filled in | ||
overrides = { | ||
self.info._substitute(k, self.config): self.info._substitute(v, self.config) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 I think we actually need to loop over layers here though - if there's a ${layer_index}
reference in a template we need to populate it for each value of layer_index
. (If you have thoughts on how to make the whole template setup cleaner I'd definitely appreciate them - it's a little silly at the moment.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was careless. I set out out to do exactly that and somehow meandered away. I will make this correction
mergekit/plan.py
Outdated
info=self.arch_info, | ||
config=model.config(trust_remote_code=self.trust_remote_code), | ||
) | ||
self.arch_dict: Dict[ModelReference, ConfiguredArchitectureInfo] = {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Slight style nitpick: I tend to prefer having type annotations for members in the top level of the class (like in lines 47-50) instead of at first assignment like you have here.
]: | ||
if arch == destination_arch: | ||
is_same_arch = True | ||
break |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this logic isn't quite right. Consider a merge with two models (aside from the base) - the first is the same architecture as the base and the second is a different architecture. This will see that the first model has the same architecture and break out of the loop, so the second will not get overrides set.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps I'm missing something but the inner most loop loops through the alternate names of both architectures and breaks if it has found a match
But the outermost loop takes care of the actual model list traversal
unless break
breaks out of both the loops which I don't think is the case, this should be fine. It is possible I'm overlooking something very trivial
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, you're absolutely right, sorry about that - I missed the outer loop. Disregard!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if I should name things better? I believe the naming of things as of now is very likely to throw the reader off. What do you think?
How does this work?
A mapping is defined as a
json
file between the base model and each model to be merged in. Specifically at thepre/post
weights levelEach json file corresponds to a base-architecture to destination pair (though the actual direction is the reverse)
an example of this is
Notes: