-
Notifications
You must be signed in to change notification settings - Fork 523
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add ChipAlign geodesic interpolation method #529
base: main
Are you sure you want to change the base?
Add ChipAlign geodesic interpolation method #529
Conversation
Implements ChipAlign model merging technique from https://arxiv.org/abs/2412.19819 that combines instruction-aligned models with domain-specific models using geodesic interpolation with magnitude preservation. Features: - Added geodesic interpolation option to NuSLERP merge method - Added ChipAlign example configuration in examples/chipalign.yml - Updated documentation in README.md
All contributors have signed the CLA ✍️ ✅ |
I have read the CLA Document and I hereby sign the CLA |
|
||
# Perform spherical interpolation on unit vectors | ||
from mergekit.merge_methods.slerp import slerp | ||
merged_tensor_unit = slerp( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd suggest using the nuslerp
function here instead - the old slerp
moves tensors to CPU so it's a lot slower. It should give the same results though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(This also lets us respect nuslerp_flatten
and nuslerp_row_wise
which would be good.)
if abs(sum(weights)) < 1e-6: | ||
# this is fairly arbitrary, but it's more sane than exploding | ||
t = 0.5 | ||
t = 0.5 # Default when weights sum to zero | ||
else: | ||
t = weights[1] / sum(weights) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason for introducing a new lambda
parameter instead of using t
?
Thank you for the pull request! Couple of comments in there, and if you could run the pre-commit hook to standardize the formatting that would be appreciated. Would be great to get this in. |
Implements ChipAlign model merging technique from https://arxiv.org/abs/2412.19819 that combines instruction-aligned models with domain-specific models using geodesic interpolation with magnitude preservation.
Features: