Skip to content

Custom MoE Routing (llama.cpp)

Compare
Choose a tag to compare
@kalomaze kalomaze released this 13 Dec 04:25

The default amount of experts that are routed per token is 2, for Mixtral's MoE setup.

This is a custom build of llama.cpp which is a modification to the Mixtral PR that lets you customize the amount of experts that are routed per token, from 1 expert (fastest) to 8 experts (slowest).

The experts.txt file lets you customize this number from the default of 2.

image It is built with CuBlas support.