Skip to content

Releases: axolotl-ai-cloud/axolotl

v0.8.1

08 Apr 00:50
Compare
Choose a tag to compare

What's Changed

Full Changelog: v0.8.0...v0.8.1

v0.8.0

02 Apr 13:51
3877c5c
Compare
Choose a tag to compare

New Features

Sequence parallelism support via ring-flash-attn

This enables long context training by distributing sequences across GPUs, reducing memory requirements per device while allowing near-linear scaling in context length per GPU. This complements other parallelism features that Axolotl offers, including FSDP and DeepSpeed. See our documentation here.
Screenshot 2025-04-02 at 9 17 14 AM

Gemma-3 support has landed alongside several features to help you fine-tune Gemma-3 models:

  • Cut cross entropy
  • Liger kernel
  • Multimodal
  • Fixed loss calculation for Gradient Accumulation

Multimodal Beta support for a variety of multi-modal models:

  • Mllama
  • Pixtral
  • Llava-1.5
  • Mistral-Small-3.1
  • Gemma-3
  • Qwen2-VL
  • Qwen2.5-VL

Additional Features

  • Updated cut-cross-entropy patches for several models: Cohere, Cohere-2, Gemma, Gemma-2, Gemma-3, Mistral-3, and Mllama
  • Support for the REX Learning Rate Scheduler - https://arxiv.org/abs/2107.04197
  • Tokenizer Overrides - you can now fine-tune with custom values in tokenizers using reserved tokens
  • Single-gpu and DDP support for Muon Optimizer
  • Sequential packing for Curriculum learning
  • Speeding up GRPO training with distributed vLLM - you can now use axolotl vllm-serve path/to/config.yaml to serve a separate vLLM instance which can utilize multiple GPUs to speed up trajectory generation during GRPO.

Notes

v0.8.x will be the last set of releases that will officially support torch<=2.4.1. With PyTorch 2.7 release this month, we aim to support the latest 2 stable releases of PyTorch.
We expect FSDP2 support to be a fast follow and we'll include that in v0.8.1 once we can fix and validate issues such as saving checkpoints.

What's Changed

Read more

v0.7.1

26 Feb 07:47
75cbd15
Compare
Choose a tag to compare

What's Changed

New Contributors

Full Changelog: v0.7.0...v0.7.1

v0.7.0

18 Feb 09:27
3c743c4
Compare
Choose a tag to compare

What's Changed

Read more

v0.6.0

09 Dec 19:20
6aa31b4
Compare
Choose a tag to compare

What's Changed

New Contributors

Full Changelog: v0.5.2...v0.6.0

v0.5.2

19 Nov 17:45
e9c3a2a
Compare
Choose a tag to compare

What's Changed

New Contributors

Full Changelog: v0.5.0...v0.5.2

v0.5.0

10 Nov 02:10
e4af51e
Compare
Choose a tag to compare

What's Changed

Read more

v0.4.0

24 Jan 20:08
1427d5b
Compare
Choose a tag to compare

New Features (highlights)

  • Streaming multipack for continued pre-training
  • Mistral & Mixtral support
  • Simplified Multipack for Mistral, Falcon, Qwen2, and Phi
  • DPO/IPO/KTO-pairs RL-training support via trl
  • Improve BatchSampler for multipack support, allows for resume from checkpointing, shuffling data each epoch
  • bf16: auto support
  • add MLFlow support
  • save YAML configs to WandB
  • save predictions during evals to WandB
  • more tests! more smoke tests for smol model training
  • NEFTune support

What's Changed

Read more

v0.3.0

19 Sep 20:29
Compare
Choose a tag to compare

What's Changed

Read more

v0.2.1

13 Jun 19:19
06652c1
Compare
Choose a tag to compare

What's Changed

New Contributors

Full Changelog: v0.2.0...v0.2.1