Custom MoE Routing (llama.cpp)

kalomaze released this 13 Dec 04:25

dd6a8db

The default amount of experts that are routed per token is 2, for Mixtral's MoE setup.

This is a custom build of llama.cpp which is a modification to the Mixtral PR that lets you customize the amount of experts that are routed per token, from 1 expert (fastest) to 8 experts (slowest).

The experts.txt file lets you customize this number from the default of 2.

It is built with CuBlas support.

Assets 3

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Custom MoE Routing (llama.cpp)