Skip to content

Latest commit

 

History

History
28 lines (24 loc) · 1.54 KB

README.md

File metadata and controls

28 lines (24 loc) · 1.54 KB

Read-ME: Refactorizing LLMs as Router-Decoupled Mixture of Experts with System Co-Design

Ruisi Cai1, Yeonju Ro1, Geon-Woo Kim1, Peihao Wang1, Babak Ehteshami Bejnordi2, Aditya Akella1, Zhangyang Wang1

1University of Texas at Austin, 2Qualcomm AI Research

Usage

The code is based on the Hugging Face Transformers repository. We modified src/transformers/model/modeling_llama.py to integrate the MoE-fication process.

The main scripts are located in the moefication directory. Start by running the preprocessing scripts, moefication/scripts/preprocess_1.sh and moefication/scripts/preprocess_2.sh, to generate experts. After preprocessing, train the model using moefication/scripts/train.sh.

Citation

If you find this useful, please cite the following paper:

@inproceedings{
cai2024textitreadme,
title={\${\textbackslash}textit\{Read-{ME}\}\$: Refactorizing {LLM}s as Router-Decoupled Mixture of Experts with System Co-Design},
author={Ruisi Cai and Yeonju Ro and Geon-Woo Kim and Peihao Wang and Babak Ehteshami Bejnordi and Aditya Akella and Zhangyang Wang},
booktitle={The Thirty-eighth Annual Conference on Neural Information Processing Systems},
year={2024},
url={https://openreview.net/forum?id=i8JaxY7tDI}
}