Read-ME: Refactorizing LLMs as Router-Decoupled Mixture of Experts with System Co-Design

Ruisi Cai¹, Yeonju Ro¹, Geon-Woo Kim¹, Peihao Wang¹, Babak Ehteshami Bejnordi², Aditya Akella¹, Zhangyang Wang¹

¹University of Texas at Austin, ²Qualcomm AI Research

Usage

The code is based on the Hugging Face Transformers repository. We modified src/transformers/model/modeling_llama.py to integrate the MoE-fication process.

The main scripts are located in the moefication directory. Start by running the preprocessing scripts, moefication/scripts/preprocess_1.sh and moefication/scripts/preprocess_2.sh, to generate experts. After preprocessing, train the model using moefication/scripts/train.sh.

Citation

If you find this useful, please cite the following paper:

@inproceedings{
cai2024textitreadme,
title={\${\textbackslash}textit\{Read-{ME}\}\$: Refactorizing {LLM}s as Router-Decoupled Mixture of Experts with System Co-Design},
author={Ruisi Cai and Yeonju Ro and Geon-Woo Kim and Peihao Wang and Babak Ehteshami Bejnordi and Aditya Akella and Zhangyang Wang},
booktitle={The Thirty-eighth Annual Conference on Neural Information Processing Systems},
year={2024},
url={https://openreview.net/forum?id=i8JaxY7tDI}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Read-ME: Refactorizing LLMs as Router-Decoupled Mixture of Experts with System Co-Design

Usage

Citation

Files

README.md

Latest commit

History

README.md

File metadata and controls

Read-ME: Refactorizing LLMs as Router-Decoupled Mixture of Experts with System Co-Design

Usage

Citation