Skip to content

Sparse Transfer Learning in Encoder Decoder Transfomer #2362

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
rakaputra12 opened this issue Apr 22, 2025 · 0 comments
Open

Sparse Transfer Learning in Encoder Decoder Transfomer #2362

rakaputra12 opened this issue Apr 22, 2025 · 0 comments

Comments

@rakaputra12
Copy link

Hello everyone!
I am currently doing a small research for my study on Sparse Transfer Learning and SparseML library is a good approach for my work. My topic is about applying sparse transfer learning on different architectures. Before that, Transformer needs to be made sparse (e.g. pruned with GMP ). Transformer architectures include encoder-only (e.g. BERT), decoder-only (e.g. GPT) and encoder-decoder (e.g. T5). For the first two architectures are technically possible based on your documentation on Github so I have looked at. For encoder-decoders it is still unclear to me if this is technically possible with SparseML. Theoretically you can just adjust/customize the recipe, but still unclear what do I have to do. what do i have to set in “params”? it's just “All Prunable” or how could you give an example recipe for T5 (encoder-decoder). That would be helpful for me. I look forward to your feedback and thank you in advance for your help!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant