Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ChipAlign geodesic interpolation method #529

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

dlmastery
Copy link

Implements ChipAlign model merging technique from https://arxiv.org/abs/2412.19819 that combines instruction-aligned models with domain-specific models using geodesic interpolation with magnitude preservation.

Features:

  • Added geodesic interpolation option to NuSLERP merge method
  • Added ChipAlign example configuration in examples/chipalign.yml
  • Updated documentation in README.md

Implements ChipAlign model merging technique from https://arxiv.org/abs/2412.19819
that combines instruction-aligned models with domain-specific models using
geodesic interpolation with magnitude preservation.

Features:
- Added geodesic interpolation option to NuSLERP merge method
- Added ChipAlign example configuration in examples/chipalign.yml
- Updated documentation in README.md
Copy link

github-actions bot commented Mar 10, 2025

All contributors have signed the CLA ✍️ ✅
Posted by the CLA Assistant Lite bot.

@dlmastery
Copy link
Author

I have read the CLA Document and I hereby sign the CLA


# Perform spherical interpolation on unit vectors
from mergekit.merge_methods.slerp import slerp
merged_tensor_unit = slerp(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd suggest using the nuslerp function here instead - the old slerp moves tensors to CPU so it's a lot slower. It should give the same results though.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(This also lets us respect nuslerp_flatten and nuslerp_row_wise which would be good.)

if abs(sum(weights)) < 1e-6:
# this is fairly arbitrary, but it's more sane than exploding
t = 0.5
t = 0.5 # Default when weights sum to zero
else:
t = weights[1] / sum(weights)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason for introducing a new lambda parameter instead of using t?

@cg123
Copy link
Collaborator

cg123 commented Mar 12, 2025

Thank you for the pull request! Couple of comments in there, and if you could run the pre-commit hook to standardize the formatting that would be appreciated. Would be great to get this in.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants