Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inter-architecture weights mapping #165

Merged
merged 29 commits into from
Feb 22, 2024
Merged

Inter-architecture weights mapping #165

merged 29 commits into from
Feb 22, 2024

Conversation

metric-space
Copy link
Contributor

@metric-space metric-space commented Feb 11, 2024

  • Ensure MergePlanner takes into account differing architectures as opposed to how things were previously ( the same architecture was expected of all models to be merged)

How does this work?

A mapping is defined as a json file between the base model and each model to be merged in. Specifically at the pre/post weights level

Each json file corresponds to a base-architecture to destination pair (though the actual direction is the reverse)

an example of this is

{
    "start_architectures": ["A"],
    "destination_architectures": ["B"],
    "weights_mapping":{
        "a" : "d",
        "b" : "e",
        "c" : "m"
    }
}

Notes:

  1. string templates are allowed in both keys and values
  2. at this time only isomorphic mappings are considered at all levels

@metric-space metric-space changed the title WIP: most likely broken code to open up multiple architectures when p… WIP Feb 11, 2024
@metric-space metric-space changed the title WIP WIP: Align-and-merge MergeMethod Feb 13, 2024
@metric-space metric-space changed the title WIP: Align-and-merge MergeMethod WIP: Pre/post level inter weights matching Feb 20, 2024
@metric-space metric-space changed the title WIP: Pre/post level inter weights matching Inter-archiecture weights mapping Feb 21, 2024
@metric-space metric-space changed the title Inter-archiecture weights mapping Inter-architecture weights mapping Feb 21, 2024
@metric-space metric-space marked this pull request as ready for review February 21, 2024 09:02
@metric-space metric-space requested a review from cg123 February 21, 2024 09:02
Copy link
Collaborator

@cg123 cg123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for getting to those changes! Looking pretty good now - I had a few more thoughts before we get this merged.

@@ -23,6 +23,7 @@ dependencies = [
"typing-extensions",
"sentencepiece",
"protobuf",
"scipy"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

whoops, sorry for missing that dependency!

def set_overrides(self, overrides: Dict[str, str]) -> "ConfiguredArchitectureInfo":
# NOTE: this makes sure template strings in overrides are filled in
overrides = {
self.info._substitute(k, self.config): self.info._substitute(v, self.config)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 I think we actually need to loop over layers here though - if there's a ${layer_index} reference in a template we need to populate it for each value of layer_index. (If you have thoughts on how to make the whole template setup cleaner I'd definitely appreciate them - it's a little silly at the moment.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was careless. I set out out to do exactly that and somehow meandered away. I will make this correction

mergekit/plan.py Outdated
info=self.arch_info,
config=model.config(trust_remote_code=self.trust_remote_code),
)
self.arch_dict: Dict[ModelReference, ConfiguredArchitectureInfo] = {}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Slight style nitpick: I tend to prefer having type annotations for members in the top level of the class (like in lines 47-50) instead of at first assignment like you have here.

@metric-space metric-space requested a review from cg123 February 21, 2024 23:28
]:
if arch == destination_arch:
is_same_arch = True
break
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this logic isn't quite right. Consider a merge with two models (aside from the base) - the first is the same architecture as the base and the second is a different architecture. This will see that the first model has the same architecture and break out of the loop, so the second will not get overrides set.

Copy link
Contributor Author

@metric-space metric-space Feb 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps I'm missing something but the inner most loop loops through the alternate names of both architectures and breaks if it has found a match
But the outermost loop takes care of the actual model list traversal

unless break breaks out of both the loops which I don't think is the case, this should be fine. It is possible I'm overlooking something very trivial

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, you're absolutely right, sorry about that - I missed the outer loop. Disregard!

Copy link
Contributor Author

@metric-space metric-space Feb 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if I should name things better? I believe the naming of things as of now is very likely to throw the reader off. What do you think?

@cg123 cg123 merged commit 34f29ad into matching-task Feb 22, 2024
6 checks passed
@cg123 cg123 deleted the FDFMM-5 branch February 22, 2024 23:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants