Sparse Transfer Learning in Encoder Decoder Transfomer #2362

rakaputra12 · 2025-04-22T09:55:16Z

Hello everyone!
I am currently doing a small research for my study on Sparse Transfer Learning and SparseML library is a good approach for my work. My topic is about applying sparse transfer learning on different architectures. Before that, Transformer needs to be made sparse (e.g. pruned with GMP ). Transformer architectures include encoder-only (e.g. BERT), decoder-only (e.g. GPT) and encoder-decoder (e.g. T5). For the first two architectures are technically possible based on your documentation on Github so I have looked at. For encoder-decoders it is still unclear to me if this is technically possible with SparseML. Theoretically you can just adjust/customize the recipe, but still unclear what do I have to do. what do i have to set in “params”? it's just “All Prunable” or how could you give an example recipe for T5 (encoder-decoder). That would be helpful for me. I look forward to your feedback and thank you in advance for your help!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sparse Transfer Learning in Encoder Decoder Transfomer #2362

Sparse Transfer Learning in Encoder Decoder Transfomer #2362

rakaputra12 commented Apr 22, 2025

Sparse Transfer Learning in Encoder Decoder Transfomer #2362

Sparse Transfer Learning in Encoder Decoder Transfomer #2362

Comments

rakaputra12 commented Apr 22, 2025