Releases · FlagOpen/FlagScale · GitHub

30 Apr 11:24

aoyulong

v0.8.0 Latest

Latest

Introduced a new flexible and robust multi-backend mechanism and updated vendor adaptation methods.
Enabled heterogeneous prefill-decoding disaggregation across vendor chips within a single instance via FlagCX (beta).
Upgraded DeepSeek-v3 pre-training with the new Megatron-LM and added heterogeneous pre-training across different chips for MoE models like DeepSeek-v3.

Assets 2

24 Feb 11:32

aoyulong

v0.6.5

Added support for DeepSeek-V3 distributed pre-training (beta) and DeepSeek-V3/R1 serving across multiple chips.
Introduced an auto-tuning feature for serving and a new CLI feature for one-click deployment.
Enhanced the CI/CD system to support more chips and integrated the workflow of FlagRelease.

Assets 2

06 Nov 09:49

aoyulong

v0.6.0

Introduced general multi-dimensional heterogeneous parallelism and CPU-based communication between different chips.
Added comprehensive support for data processing and faster distributed training of LLaVA-OneVision, achieving SOTA results on the Infinity-MM dataset.
Open-sourced the optimized CFG implementation and accelerated the generation and understanding tasks for Emu3.
Implemented the auto-tuning feature to simplify large-scale distributed training, making it more accessible for users with less expertise.
Enhanced the CI/CD system to facilitate more efficient unit testing across different backends and perform the loss check for the various parallel strategies.

Assets 2

11 Apr 02:34

aoyulong

v0.3

Accomplish the heterogeneous hybrid training of the Aquila2-70B-Expr model on a cluster utilizing a combination of NVIDIA and Iluvatar chips.
Provide the training of the Aquila2 series across a variety of AI chips from six distinct manufacturers.

Assets 2

30 Nov 09:27

aoyulong

v0.2

Provide the actually used training scheme for Aquila2-70B-Expr, including the parallel strategies, optimizations and hyper-parameter settings.
Support heterogeneous training on chips of different generations with the same architecture or compatible architectures, including NVIDIA GPUs and Iluvatar CoreX chips.
Support training on chinese domestic hardwares, including Iluvatar CoreX and Baidu KUNLUN chips.

Assets 2