GitHub - AstrisCantCode/Expertize: A simple set of scripts to convert a dense model into a Mixture of Experts (MoE) model

note: after thinking about this, I'm realizing I'm woefully underinformed about calculus. I'll work on grokking calculus Note 2: After some tests, I now believe this to be ineffective at 'expertizing' language models. Initial reductions in training loss were likely due to re-training of the newly created experts, which defeats the purpose of this repo. (If you need to fully re-train the MoE block, what's the point of starting from the base model's weights and applying SVD?)

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
README.md		README.md
expertize.py		expertize.py
sparsify.py		sparsify.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Releases

Packages

Languages

AstrisCantCode/Expertize

Folders and files

Latest commit

History

Repository files navigation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages