Releases: FlagOpen/FlagScale
Releases · FlagOpen/FlagScale
v0.8.0
- Introduced a new flexible and robust multi-backend mechanism and updated vendor adaptation methods.
- Enabled heterogeneous prefill-decoding disaggregation across vendor chips within a single instance via FlagCX (beta).
- Upgraded DeepSeek-v3 pre-training with the new Megatron-LM and added heterogeneous pre-training across different chips for MoE models like DeepSeek-v3.
v0.6.5
- Added support for DeepSeek-V3 distributed pre-training (beta) and DeepSeek-V3/R1 serving across multiple chips.
- Introduced an auto-tuning feature for serving and a new CLI feature for one-click deployment.
- Enhanced the CI/CD system to support more chips and integrated the workflow of FlagRelease.
v0.6.0
- Introduced general multi-dimensional heterogeneous parallelism and CPU-based communication between different chips.
- Added comprehensive support for data processing and faster distributed training of LLaVA-OneVision, achieving SOTA results on the Infinity-MM dataset.
- Open-sourced the optimized CFG implementation and accelerated the generation and understanding tasks for Emu3.
- Implemented the auto-tuning feature to simplify large-scale distributed training, making it more accessible for users with less expertise.
- Enhanced the CI/CD system to facilitate more efficient unit testing across different backends and perform the loss check for the various parallel strategies.
v0.3
v0.2
- Provide the actually used training scheme for Aquila2-70B-Expr, including the parallel strategies, optimizations and hyper-parameter settings.
- Support heterogeneous training on chips of different generations with the same architecture or compatible architectures, including NVIDIA GPUs and Iluvatar CoreX chips.
- Support training on chinese domestic hardwares, including Iluvatar CoreX and Baidu KUNLUN chips.