Custom MoE Routing (llama.cpp)
The default amount of experts that are routed per token is 2, for Mixtral's MoE setup.
This is a custom build of llama.cpp which is a modification to the Mixtral PR that lets you customize the amount of experts that are routed per token, from 1 expert (fastest) to 8 experts (slowest).
The experts.txt
file lets you customize this number from the default of 2.
