Skip to content

A simple set of scripts to convert a dense model into a Mixture of Experts (MoE) model

Notifications You must be signed in to change notification settings

AstrisCantCode/Expertize

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 

Repository files navigation

note: after thinking about this, I'm realizing I'm woefully underinformed about calculus. I'll work on grokking calculus Note 2: After some tests, I now believe this to be ineffective at 'expertizing' language models. Initial reductions in training loss were likely due to re-training of the newly created experts, which defeats the purpose of this repo. (If you need to fully re-train the MoE block, what's the point of starting from the base model's weights and applying SVD?)

About

A simple set of scripts to convert a dense model into a Mixture of Experts (MoE) model

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages