Releases · axolotl-ai-cloud/axolotl

08 Apr 00:50

github-actions

v0.8.1

ebe5aba

v0.8.1 Latest

Latest

What's Changed

add additional tf32 opt for cudnn by @winglian in #2477
fix(example): align example to correct adapter by @NanoCode012 in #2478
removing deepspeed guard for LoRA Triton kernels by @djsaunde in #2480
check if fixture exists in the cache already by @winglian in #2485
simplify the example configs to be more minimal and less daunting by @winglian in #2486
fix: cohere cce scaling wrong tensor by @NanoCode012 in #2483
fix tokenizer overrides w gemma3 by @winglian in #2488
Update dependencies and show slow tests in CI by @winglian in #2492
Flex Attention + Packing with BlockMask support by @bursteratom in #2363
FSDP2 support by @winglian in #2469
llama4 support by @winglian in #2493
feat: add llama4 multimodal by @NanoCode012 in #2499
fix: duplicate llama4 chattemplate enum by @NanoCode012 in #2500
fix(doc): clarify roles mapping in chat_template by @NanoCode012 in #2490
Feat: Add doc on loading datasets and support for Azure/OCI by @NanoCode012 in #2482
SP cu_seqlens fix, refactor by @djsaunde in #2495
feat: add llama4 CCE by @NanoCode012 in #2498
Llama4 linearized by @winglian in #2502

Full Changelog: v0.8.0...v0.8.1

Contributors

winglian, djsaunde, and 2 other contributors

Assets 2

02 Apr 13:51

github-actions

v0.8.0

3877c5c

v0.8.0

New Features

Sequence parallelism support via ring-flash-attn

This enables long context training by distributing sequences across GPUs, reducing memory requirements per device while allowing near-linear scaling in context length per GPU. This complements other parallelism features that Axolotl offers, including FSDP and DeepSpeed. See our documentation here.

Gemma-3 support has landed alongside several features to help you fine-tune Gemma-3 models:

Cut cross entropy
Liger kernel
Multimodal
Fixed loss calculation for Gradient Accumulation

Multimodal Beta support for a variety of multi-modal models:

Mllama
Pixtral
Llava-1.5
Mistral-Small-3.1
Gemma-3
Qwen2-VL
Qwen2.5-VL

Additional Features

Updated cut-cross-entropy patches for several models: Cohere, Cohere-2, Gemma, Gemma-2, Gemma-3, Mistral-3, and Mllama
Support for the REX Learning Rate Scheduler - https://arxiv.org/abs/2107.04197
Tokenizer Overrides - you can now fine-tune with custom values in tokenizers using reserved tokens
Single-gpu and DDP support for Muon Optimizer
Sequential packing for Curriculum learning
Speeding up GRPO training with distributed vLLM - you can now use axolotl vllm-serve path/to/config.yaml to serve a separate vLLM instance which can utilize multiple GPUs to speed up trajectory generation during GRPO.

Notes

v0.8.x will be the last set of releases that will officially support torch<=2.4.1. With PyTorch 2.7 release this month, we aim to support the latest 2 stable releases of PyTorch.
We expect FSDP2 support to be a fast follow and we'll include that in v0.8.1 once we can fix and validate issues such as saving checkpoints.

What's Changed

train.py refactor by @djsaunde in #2371
fix(doc): add installation for cce to docs by @NanoCode012 in #2375
chore(docs): remove phorm by @NanoCode012 in #2378
feat(doc): add docker images explanation by @NanoCode012 in #2379
feat(doc): document drop_system_message and clarify limitation by @NanoCode012 in #2381
chore(doc): add clarification about mpi4py error on single gpu deepspeed by @NanoCode012 in #2383
fix(doc): add missing low_cpu_mem_usage config to docs by @NanoCode012 in #2369
feat(grpo): add reward_weights config and refactor by @NanoCode012 in #2365
Add REX LR Scheduler by @xzuyn in #2380
Update Tokenizer Overrides Handling in models.py by @mhenrichsen in #1549
various fixes 20250305 by @winglian in #2384
Optimizer refactor and add Muon support by @winglian in #2367
remove lion-pytorch as it's already handled upstream by @winglian in #2389
refactor: trl grpo configs to have descriptions by @NanoCode012 in #2386
feat(doc): add more info on RewardModel datasets by @NanoCode012 in #2391
chore(doc): add faq when having no default chat_template by @NanoCode012 in #2398
Use Latest Cut Cross Entropy by @xzuyn in #2392
fix: create mount folder on modal if not exist by @NanoCode012 in #2390
include iproute2 and nvtop in cloud image by @winglian in #2393
fix(modal): add git pull when getting branch files by @NanoCode012 in #2399
pass additional info for fix untrained tokens when using distributed + offloading by @winglian in #2388
use max of 32 dataset processes if not explicit by @winglian in #2403
build cloud images with torch 2.6.0 by @winglian in #2413
only validate hf user token on rank 0 by @winglian in #2408
fixes against upstream main branches by @winglian in #2407
chore(docs): add cookbook/blog link to docs by @NanoCode012 in #2410
Feat: minor docs improvements for RLHF and faq on embeddings by @NanoCode012 in #2401
Update README.md by @SicariusSicariiStuff in #2360
use default torch fused adamw optimizer as default as adamw_hf is deprecated by @winglian in #2425
bump HF versions except for trl by @winglian in #2427
add 12.8.1 cuda to the base matrix by @winglian in #2426
add run on novita ai by @liyiligang in #2421
chore(doc): add instructions on adding custom integrations by @NanoCode012 in #2422
Fixing KTO+QLoRA+multi-GPU by @SalmanMohammadi in #2420
adding pre-commit auto-update GH action and bumping plugin versions by @djsaunde in #2428
chore(doc): add explanation on fsdp_transformer_layer_cls_to_wrap by @NanoCode012 in #2429
Autodoc generation with quartodoc by @djsaunde in #2419
Sequence parallelism by @djsaunde in #2412
installing axolotl prior to quartodoc build by @djsaunde in #2434
Fix failing test by @djsaunde in #2436
Feat: Add support for gemma3_text and add e2e for gemma2 by @NanoCode012 in #2406
Feat: Rework multimodal support (mllama, llava, pixtral, qwen2, qwen25, gemma3, mistral3) by @NanoCode012 in #2435
feat: add CCE for gemma3, cohere, and cohere2 by @NanoCode012 in #2443
chore: minor optim changes (add apollo, improve docs, remove lion-pytorch) by @NanoCode012 in #2444
fix(doc): documentdo_causal_lm_eval required to run eval_causal_lm_metrics by @NanoCode012 in #2445
Set the pytorch_cuda_alloc_conf env in the train module by @winglian in #2447
add override of upstream fix for multi-gpu orpo by @winglian in #2440
hf offline decorator for tests to workaround rate limits by @winglian in #2452
bump liger to 0.5.5 by @winglian in #2448
use offline for precached stream dataset by @winglian in #2453
fix streaming packing test by @winglian in #2454
fix: minor patches for multimodal by @NanoCode012 in #2441
Sequence parallelism quick follow-ups; remove ModelCallback by @djsaunde in #2450
destroy process group on Ctrl+C / training or eval run by @djsaunde in #2457
Ray train bugfix by @djsaunde in #2458
Updates for trl 0.16.0 - mostly for GRPO by @winglian in #2437
Fix(doc): Clarify doc on attention configs and missing pad_token by @NanoCode012 in #2455
Sequential sample packing by @DreamGenX in #2404
gemma3 packing fixes by @winglian in #2449
Release update 20250331 by @winglian in #2460
Fix(doc): Minor doc changes for peft and modal by @NanoCode012 in #2462
Fix: remove the numerous sequential log by @NanoCode012 in #2461
Validation for Muon optimizer with DS/FSDP by @winglian in #2464
fixing eval for SP by @djsaunde in #2468
fix: downgrade deepspeed to fix grad checkpoint oom by @NanoCode012 in #2465
fix: set rl=None during inference by @NanoCode012 in #2463
torch 2.7.0 base image for testing by @winglian in #2467
fix: pydantic warning validator not returning self by @NanoCode012 in http...

Contributors

winglian, djsaunde, and 7 other contributors

Assets 2

26 Feb 07:47

github-actions

v0.7.1

75cbd15

v0.7.1

What's Changed

bump dev version by @winglian in #2342
Doc fix: TORCH_ROCM_AOTRITON_ENABLE_EXPERIMENTAL not necessary to use Triton kernel patches by @djsaunde in #2343
make sure chatml dpo dataset loading works by @winglian in #2333
Fix sample packing producing longer sequences than specified by sequence_len by @tobmi1 in #2332
quick formatting fix for LoRA optims doc by @djsaunde in #2349
calculate sample length fixes and SFT splitting fixes by @winglian in #2351
feat: update transformers version to 4.49.0 by @NanoCode012 in #2340
Bumping 0.15.1 TRL version for GRPO+PEFT fix by @SalmanMohammadi in #2344
support for passing init_lora_weights to lora_config by @winglian in #2352
fix(doc): add missing auto_find_batch_size by @NanoCode012 in #2339
don't install extraneous old version of pydantic in ci and make sre to run multigpu ci by @winglian in #2355
Relicense the logprob KD loss functions as Apache 2.0 by @winglian in #2358
Correctly reference mount paths by @reissbaker in #2347
bump liger to 0.5.3 by @winglian in #2353
feat: add deepseek_v3 sample packing by @NanoCode012 in #2230
Feat(doc): Reorganize documentation, fix broken syntax, update notes by @NanoCode012 in #2348
Fix(doc): address missing doc changes by @NanoCode012 in #2362

New Contributors

@tobmi1 made their first contribution in #2332
@reissbaker made their first contribution in #2347

Full Changelog: v0.7.0...v0.7.1

Contributors

winglian, reissbaker, and 4 other contributors

Assets 2

18 Feb 09:27

github-actions

v0.7.0

3c743c4

v0.7.0

What's Changed

fix build w pyproject to respect insalled torch version by @winglian in #2168
evaluation_strategy was fully deprecated in recent release by @winglian in #2169
parity for nightly ci - make sure to install setuptools by @winglian in #2176
pin transformers to 4.47.0 by @winglian in #2180
[feature] add pytorch profiling by @winglian in #2182
Basic evaluate CLI command / codepath by @djsaunde in #2188
transformers 4.47.1 by @winglian in #2187
Add hub model id config options to all example yml files. by @bursteratom in #2196
move the setting of PYTORCH_CUDA_ALLOC_CONF to the cli rather than train module by @winglian in #2183
use axolotl contribs for fix_untrained_tokens by @winglian in #2194
upgrade to liger 0.5.2 by @winglian in #2181
dataset tags don't support https uris by @winglian in #2195
fix: use apply_chat_template to find turn boundaries and allow tool_calling field by @NanoCode012 in #2179
handle torch_compile set to auto by @winglian in #2172
use DataCollatorWithFlattening when not sample packing by @winglian in #2167
remove cicd pytest xdist args by @djsaunde in #2201
add outputs (symlink) to gitignore by @winglian in #2205
adding test_datasets compat with pretraining_dataset (streaming) by @djsaunde in #2206
move the dataset loading from remote/disk to a shared function so we can re-use for RL by @winglian in #2204
GC every n steps by @winglian in #2209
add deepspeed example with torch compile enabled by @winglian in #2212
inference - don't default w accelerate, fix base model by @winglian in #2216
fix untrained tokens if specified explicitly from a list by @winglian in #2210
fix: allow trainer builder to use custom jinja chat template by @NJordan72 in #2219
make sure padding is labeled as -100 for pretraining by @winglian in #2227
Fixing OSX installation by @SalmanMohammadi in #2231
Merge group queue by @winglian in #2248
fix: mistral nemo does not recognize token_type_ids in forward by @NanoCode012 in #2233
add hf cache caching for GHA by @winglian in #2247
update modal version for ci by @winglian in #2242
feat: use SequentialSampler if curriculum_sampling is enabled with sample_packing by @v-dicicco in #2235
feat: add support for data_files in pretraining by @NanoCode012 in #2238
update upstream HF deps by @winglian in #2239
rename liger test so it properly runs in ci by @winglian in #2246
use 2.5.1 docker images as latest tag as it seems stable by @winglian in #2198
add helper to verify the correct model output file exists by @winglian in #2245
assume empty lora dropout means 0.0 and add tests by @winglian in #2243
skip over rows in pretraining dataset by @winglian in #2223
CLI cleanup and documentation by @djsaunde in #2244
rename references to dpo dataset prep to pref data by @winglian in #2258
fix: use text_column even when not packing for pretraining by @NanoCode012 in #2254
fix for indexing error inside torch.embeddings caused by num embeddings > num tokens in tokenizer by @jwongTensora in #2257
option to not concatenate during pretraining by @winglian in #2263
Add 5000 line history limit to tmux for docker cloud by @adi-kmt in #2268
use the extracted field_messages to parse the role fields by @winglian in #2265
support for latest transformers release 4.48.1 by @winglian in #2256
chore(doc): fix explanation on gcs creds retrieval by @NanoCode012 in #2272
Take split param from config in all load_dataset instances by @mashdragon in #2281
chore(doc): improve explanation for *_steps and *_strategy by @NanoCode012 in #2270
Pretrain multipack by @winglian in #2278
support for custom lr groups for non-embedding modules by @winglian in #2213
bump bnb to 0.45.1 by @winglian in #2289
chore: refactor SaveModelCallback to stop handle fractional save_steps by @NanoCode012 in #2291
Num epochs float by @mashdragon in #2282
Removing torch 2.3.1 by @SalmanMohammadi in #2294
Process reward models by @SalmanMohammadi in #2241
Ray Train Axolotl Integration by @erictang000 in #2251
native support for modal cloud from CLI by @winglian in #2237
Defaulting to fused=True AdamW by @SalmanMohammadi in #2293
match the cuda version for 2.4.1 build w/o tmux by @winglian in #2299
make save_safetensors: true the default by @winglian in #2292
refactor README; hardcode links to quarto docs; add additional quarto doc pages by @djsaunde in #2295
Misc fixes 20250130 by @winglian in #2301
fix: add warning for invalid eval_steps or save_steps by @NanoCode012 in #2298
KD Trainer V2 by @winglian in #2303
set MODAL_IMAGE_BUILDER_VERSION=2024.10 to 2024.10 to test latest builder by @winglian in #2302
better handling of multipack dataset length by @winglian in #2296
[feature] sweeps by @winglian in #2171
fix: drop long seq even if not sample packing by @NanoCode012 in #2211
Torch 2.6 support for base docker image by @winglian in #2312
feat: add torch2.6 to ci by @NanoCode012 in #2311
batch add of spectrum snr results by @winglian in #2320
bump transformers to 4.48.3 by @winglian in #2318
feat: update FA to 2.7.4.post1 which includes torch2.6 binary by @NanoCode012 in #2315
chore: remove redundant py310 from tests by @NanoCode012 in #2316
fix(config): missing config not being documented and fix model_ override by @NanoCode012 in #2317
feat(doc): Add multi-node torchrun info by @NanoCode012 in #2304
Update faq.qmd by @bursteratom in #2319
lint docs by @winglian in #2327
disable ray tests for latest torch release by @winglian in #2328
[Fixing #2149] load_from_disk for RL-type training by @leeparkuky in #2193
GRPO by @winglian in #2307
feat(doc): Improve guide to dataset types with better examples by @NanoCode012 in #2286
feat(doc): add tensorboard config to docs by @NanoCode012 in #2329
Add bos_token and add_generation_prompt to the alpaca chat template by @minpeter in https://github.c...

Contributors

winglian, djsaunde, and 12 other contributors

Assets 2

09 Dec 19:20

github-actions

v0.6.0

6aa31b4

v0.6.0

What's Changed

release 0.5.2 by @winglian in #2086
use pep440 instead of semver by @winglian in #2088
Fix duplication of plugin callbacks by @chiragjn in #2090
fix inference when no chat_template is set, fix unsloth dora check by @winglian in #2092
Enable Ascend NPU support by @MengqingCao in #1758
Bump liger-kernel requirements to 0.4.2 by @bursteratom in #2096
fix None-type not iterable error when deepspeed is left blank w/ use_… by @bursteratom in #2087
actions/create-release is unmaintained, and doesn't create proper release notes by @winglian in #2098
updated colab notebook by @bursteratom in #2074
.gitignore additions by @tmm1 in #349
move shared pytest conftest to top level tests by @winglian in #2099
add finetome dataset to fixtures, check eval_loss in test by @winglian in #2106
fix: ds3 and fsdp lmbench eval by @NanoCode012 in #2102
support seperate lr for embeddings, similar to loraplus by @winglian in #1910
add e2e tests for Unsloth qlora and test the builds by @winglian in #2093
Add Exact Deduplication Feature to Preprocessing Pipeline by @olivermolenschot in #2072
various tests fixes for flakey tests by @winglian in #2110
build causal_conv1d and mamba-ssm into the base image by @winglian in #2113
make the eval size smaller for the resume test by @winglian in #2111
use pytest sugar and verbose for more info during ci by @winglian in #2112
Check torch version for ADOPT optimizer + integrating new ADOPT updates by @bursteratom in #2104
fix(vlm): handle legacy conversation data format and check image in data by @NanoCode012 in #2018
Add ds model card, rebased by @bursteratom in #2101
fix so inference can be run against quantized models without adapters by @winglian in #1834
fix merge conflict of duplicate max_steps in config for relora by @winglian in #2116
feat: add cut_cross_entropy by @NanoCode012 in #2091
fix(readme): update cuda instructions during preprocess by @NanoCode012 in #2114
fix optimizer reset for relora sft by @winglian in #1414
prepare plugins needs to happen so registration can occur to build the plugin args by @winglian in #2119
add missing fixture decorator for predownload dataset by @winglian in #2117
replace tensorboard checks with helper function by @winglian in #2120
drop unnecessary BNB_CUDA_VERSION env var from docker as it just results in warnings by @winglian in #2121
update fix_untrained_tokens from unsloth with additional fixes by @winglian in #2122
cleanup the readme, add Modal as sponsor by @winglian in #2130
fix license header for fix_untrained_tokens from unsloth-zoo by @winglian in #2129
CLI Implementation with Click by @djsaunde in #2107
auto-versioning and adding axolotl.version by @djsaunde in #2127
remove accidentally included symlink by @winglian in #2131
upgrade bnb 0.45.0 and peft by @winglian in #2126
Fix broken CLI; remove duplicate metadata from setup.py by @djsaunde in #2136
reduce test concurrency to avoid HF rate limiting, test suite parity by @winglian in #2128
Fix llama type model check by @chiragjn in #2142
Transformers 4.47.0 by @winglian in #2138
[tests] reset known modules that are patched on each test function end by @winglian in #2147
add --version support to axolotl cli by @winglian in #2152
fix for auto_map check when using remote code and multipack for models like deepseek by @winglian in #2151
bump autoawq to 0.2.7.post3 by @winglian in #2150
Transformers version flexibility and FSDP optimizer patch by @winglian in #2155
add additional fft deepspeed variants by @winglian in #2153
Fixing issue#2134 Axolotl Crashes At The End Of Training If Base Model Is Local by @bursteratom in #2140
use manual version for now by @winglian in #2156
fix: duplicate mlflow logging by @NanoCode012 in #2109
upgrade deepspeed to 0.16.1 by @winglian in #2157
add missing init to optimizers path by @winglian in #2160
feat: add kto example by @NanoCode012 in #2158
don't add dataset tags if empty due to all local data paths by @winglian in #2162
fix: chat_template masking due to truncation, consolidate turn build and keys within field by @NanoCode012 in #2123
need to update deepspeed version in extras too by @winglian in #2161
[docs] Update README Quickstart to use CLI by @winglian in #2137
fix release command by @winglian in #2163
make sure to checkout tag before creating release by @winglian in #2164

New Contributors

@djsaunde made their first contribution in #2107

Full Changelog: v0.5.2...v0.6.0

Contributors

tmm1, winglian, and 6 other contributors

Assets 2

19 Nov 17:45

github-actions

v0.5.2

e9c3a2a

v0.5.2

What's Changed

move deprecated kwargs from trainer to trainingargs by @winglian in #2028
add axolotlai docker hub org to publish list by @winglian in #2031
update actions version for node16 deprecation by @winglian in #2037
replace references to personal docker hub to org docker hub by @winglian in #2036
feat: add metharme chat_template by @NanoCode012 in #2033
change deprecated Stub to App by @winglian in #2038
fix: handle sharegpt dataset missing by @NanoCode012 in #2035
add P2P env when multi-gpu but not the full node by @winglian in #2041
invert the string in string check for p2p device check by @winglian in #2044
feat: print out dataset length even if not preprocess by @NanoCode012 in #2034
Add example YAML file for training Mistral using DPO by @olivermolenschot in #2029
fix: inference not using chat_template by @NanoCode012 in #2019
feat: cancel ongoing tests if new CI is triggered by @NanoCode012 in #2046
feat: upgrade to liger 0.4.1 by @NanoCode012 in #2045
run pypi release action on tag create w version by @winglian in #2047
make sure to tag images in docker for tagged releases by @winglian in #2051
retry flaky test_packing_stream_dataset test that timesout on read by @winglian in #2052
install default torch version if not already, new xformers wheels for torch 2.5.x by @winglian in #2049
fix push to main and tag semver build for docker ci by @winglian in #2054
Update unsloth for torch.cuda.amp deprecation by @bursteratom in #2042
don't cancel the tests on main automatically for concurrency by @winglian in #2055
ADOPT optimizer integration by @bursteratom in #2032
Grokfast support by @winglian in #1917
upgrade to flash-attn 2.7.0 by @winglian in #2048
make sure to add tags for versioned tag on cloud docker images by @winglian in #2060
fix duplicate base build by @winglian in #2061
fix env var extraction by @winglian in #2043
gradient accumulation tests, embeddings w pad_token fix, smaller models by @winglian in #2059
upgrade datasets==3.1.0 and add upstream check by @winglian in #2067
update to be deprecated evaluation_strategy by @winglian in #1682
remove the bos token from dpo outputs by @winglian in #1733
support passing trust_remote_code to dataset loading by @winglian in #2050
support for schedule free and e2e ci smoke test by @winglian in #2066
Fsdp grad accum monkeypatch by @winglian in #2064
fix: loading locally downloaded dataset by @NanoCode012 in #2056
Update get_unpad_data patching for multipack by @chiragjn in #2013
increase worker count to 8 for basic pytests by @winglian in #2075
upgrade autoawq==0.2.7.post2 for transformers fix by @winglian in #2070
optim e2e tests to run a bit faster by @winglian in #2069
don't build bdist by @winglian in #2076
static assets, readme, and badges update v1 by @winglian in #2077
Readme updates v2 by @winglian in #2078
bump transformers for fsdp-grad-accum fix, remove patch by @winglian in #2079
Feat: Drop long samples and shuffle rl samples by @NanoCode012 in #2040
add optimizer step to prevent warning in tests by @winglian in #1502
fix brackets on docker ci builds, add option to skip e2e builds by @winglian in #2080
remove deprecated extra metadata kwarg from pydantic Field by @winglian in #2081
release version 0.5.1 by @winglian in #2082
make sure action has permission to create release by @winglian in #2083
set manifest and fix for source dist by @winglian in #2084
add missing dunder-init for monkeypatches and add tests for install from sdist by @winglian in #2085

New Contributors

@olivermolenschot made their first contribution in #2029

Full Changelog: v0.5.0...v0.5.2

Contributors

winglian, NanoCode012, and 3 other contributors

Assets 2

10 Nov 02:10

winglian

v0.5.0

e4af51e

v0.5.0

What's Changed

fix(log): improve warning to clarify that lora_modules_to_save expect a list by @NanoCode012 in #1197
Add: colab example by @JohanWork in #1196
Feat/chatml add system message by @mhenrichsen in #1117
fix learning rate scheduler's warnings by @RicardoDominguez in #1135
precompute dpo logprobs setting and fixes by @winglian in #1199
Update deps 202401 by @winglian in #1204
make sure to register the base chatml template even if no system message is provided by @winglian in #1207
workaround for transformers bug requireing do_sample for saveing pretrained by @winglian in #1206
more checks and fixes for deepspeed and fsdp by @winglian in #1208
drop py39 docker images, add py311, upgrade pytorch to 2.1.2 by @winglian in #1205
Update qlora.yml - DeprecationWarning: max_packed_sequence_len is n… by @7flash in #1210
Respect sliding_window=None by @DreamGenX in #1214
ensure the tests use the same version of torch as the latest base docker images by @winglian in #1215
ADD: warning if hub_model_id ist set but not any save strategy by @JohanWork in #1202
run PR e2e docker CI tests in Modal by @winglian in #1217
Revert "run PR e2e docker CI tests in Modal" by @winglian in #1220
FEAT: add tagging support to axolotl for DPOTrainer by @filippo82 in #1209
Peft lotfq by @winglian in #1222
Fix typos (pretained -> pretrained) by @xhedit in #1231
Fix and document test_datasets by @DreamGenX in #1228
set torch version to what is installed during axolotl install by @winglian in #1234
Cloud motd by @winglian in #1235
[Nit] Fix callout by @hamelsmu in #1237
Support for additional_special_tokens by @DreamGenX in #1221
Peft deepspeed resume by @winglian in #1227
support for true batches with multipack by @winglian in #1230
add contact info for dedicated support for axolotl by @winglian in #1243
fix(model): apply gate fp32 only for mixtral by @NanoCode012 in #1241
relora: magnitude pruning of the optimizer by @winglian in #1245
Pretrain transforms by @winglian in #1261
Fix typo bloat16 -> bfloat16 by @chiragjn in #1257
Add more save strategies for DPO training. by @PhilipMay in #1255
BUG FIX: lock pytorch version in colab example by @JohanWork in #1247
Fix typo preventing model_kwargs being injected by @zacbrannelly in #1262
contributor avatars by @winglian in #1269
simplify haldning for newer multipack patches so they can be added in a single place by @winglian in #1270
Add link to axolotl cloud image on latitude by @winglian in #1275
copy edits by @winglian in #1276
allow remote data paths by @hamelsmu in #1278
add support for https remote yamls by @hamelsmu in #1277
run the docker image builds and push on gh action gpu runners by @winglian in #1218
Update README.md by @hamelsmu in #1281
don't use load and push together by @winglian in #1284
Add MPS support by @maximegmd in #1264
allow the optimizer prune ration for relora to be configurable by @winglian in #1287
Scheduler implementation of Continual Pre-Training of Large Language Models: How to (re)warm your model? by @jinwonkim93 in #1273
Add seq2seq eval benchmark callback by @LeonardoEmili in #1274
Validation always happens on first step by @LeonardoEmili in #1300
fix(examples): remove is_*_derived as it's parsed automatically by @NanoCode012 in #1297
Allow load_best_model_at_end to be configured for early stopping on custom evaluation datasets by @dameikle in #1291
Add instructions for playing with qlora model to colab example by @jaredpalmer in #1290
fix(readme): update inference md link by @NanoCode012 in #1311
Adding Google's gemma Model by @monk1337 in #1312
multipack for gemma by @winglian in #1313
deprecate: pytorch 2.0.1 image by @NanoCode012 in #1315
fix(readme): Clarify doc for tokenizer_config by @NanoCode012 in #1323
[bug-report template] Use yaml codeblock for config.yaml field by @kallewoof in #1303
make mlflow optional by @winglian in #1317
Pydantic 2.x cfg by @winglian in #1239
chore: update readme to be more clear by @NanoCode012 in #1326
ADD: push checkpoints to mlflow artifact registry by @JohanWork in #1295
hotfix for capabilities loading by @winglian in #1331
hotfix for lora rank by @winglian in #1332
hotfix for missing outputs params by @winglian in #1333
hotfix to exclude_unset from pydantic config when converting back to a dict by @winglian in #1334
Add StableLM 2 Example Scripts by @ncoop57 in #1327
add lion-pytorch optimizer by @maximegmd in #1299
Support user-defined prompt processing strategies for dpo by @nopperl in #1248
more pydantic fixes by @winglian in #1338
Mps mistral lora by @maximegmd in #1292
fix: checkpoint saving with deepspeed by @NanoCode012 in #1321
Update debugging.md by @hamelsmu in #1339
fix steps check for anneal on first cycle by @winglian in #1316
Update fastchat_conversation_turns.py by @eltociear in #1294
add gemma instruct chat template by @winglian in #1341
more fixes 20240228 by @winglian in #1342
deprecate py 3.9 support, set min pytorch version by @winglian in #1343
Fix use_mlflow to be bool instead of str by @chiragjn in #1344
fix for protected model_ namespace w pydantic by @winglian in #1345
run tests again on Modal by @winglian in #1289
chore: enable sample_packing for Gemma [skip ci] by @NanoCode012 in #1351
Fix validation for early stopping by @chiragjn in #1358
plain input/output prompt strategy w/o chat templates by @winglian in #1346
lora+ support by @winglian in #1352
allow the sharegpt handler to also better handle datasets destined for openai finetuning by @winglian in #1361
Update tinyllama lora.yml to fix eval packing issue by @rasbt in #1362
add starcoder2 by @ehartford in htt...

Contributors

tmm1, lhl, and 91 other contributors

Assets 2

24 Jan 20:08

winglian

v0.4.0

1427d5b

v0.4.0

New Features (highlights)

Streaming multipack for continued pre-training
Mistral & Mixtral support
Simplified Multipack for Mistral, Falcon, Qwen2, and Phi
DPO/IPO/KTO-pairs RL-training support via trl
Improve BatchSampler for multipack support, allows for resume from checkpointing, shuffling data each epoch
bf16: auto support
add MLFlow support
save YAML configs to WandB
save predictions during evals to WandB
more tests! more smoke tests for smol model training
NEFTune support

What's Changed

document that packaging needs to be installed before flash-attn by @winglian in #559
Fix pretraining with iterable/streaming Dataset by @jphme in #556
Add training callback to send predictions to WandB table by @Glavin001 in #521
fix wandb so mypy doesn't complain by @winglian in #562
check for the existence of the default accelerate config that can create headaches by @winglian in #561
add optimization for group-by-len by @winglian in #563
gracefully handle length feature used for group by by @winglian in #565
improve how we setup eval/save strategies and steps by @winglian in #547
let hf trainer handle torch compile by @winglian in #516
Model parallel by @winglian in #538
fix save_steps so it doesn't get duplicated by @winglian in #567
set auto for other params that hf trainer sets for ds. include zero1 json by @winglian in #570
remove columns after tokenizing for pretraining by @winglian in #571
mypy wandb ignore by @winglian in #572
Phi examples by @winglian in #569
e2e testing by @winglian in #574
E2e device cuda by @winglian in #575
E2e passing tests by @winglian in #576
refactor scripts/finetune.py into new cli modules by @winglian in #550
update support matrix with btlm and phi by @winglian in #579
prevent cli functions from getting fired on import by @winglian in #581
Fix Codellama examples by @Kimiko-AI in #582
support custom field for completion from yml by @winglian in #580
Feat(doc): Add features to doc by @NanoCode012 in #583
Support Sample packing for phi arch by @winglian in #586
don't resize embeddings if it's already large enough by @winglian in #577
Enable full (non-sharded) model saving with SHARDED_STATE_DICT by @jphme in #584
make phi training work with Loras by @winglian in #588
optionally configure sample packing for evals by @winglian in #589
don't add position_ids for evals when not using eval sample packing by @winglian in #591
gather/broadcast the max value of the packing efficiency automatically by @winglian in #463
Feat(data): Allow loading local csv and text by @NanoCode012 in #594
add bf16 check by @winglian in #587
btlm and falcon monkey patches for flash attn by @winglian in #566
minor tweaks to simplify by @winglian in #597
Fix for check with cfg and merge_lora by @winglian in #600
improve handling for empty text on the tokenization step by @winglian in #502
more sane defaults for openllama 3b used for quickstarts by @winglian in #602
update dockerfile to not build evoformer since it fails the build by @winglian in #607
Delete duplicate lines in models.py by @bofenghuang in #606
support to disable exllama for gptq by @winglian in #604
Update requirements.txt - Duplicated package by @Psancs05 in #610
Only run tests when a change to python files is made by @maximegmd in #614
Create multi-node.md by @maximegmd in #613
fix distributed devices by @maximegmd in #612
ignore wandb to resolve isort headaches by @winglian in #619
skip the gpu memory checks if the device is set to 'auto' by @winglian in #609
let MAX_JOBS use the default since we're not resource constrained on our self-hosted runners by @winglian in #427
run eval on the first step to get a baseline by @winglian in #617
split completion text to sequence_len by @winglian in #616
misc fixes to add gptq tests by @winglian in #621
chore(callback): Remove old peft saving code by @NanoCode012 in #510
update README w deepspeed info by @winglian in #605
create a model card with axolotl badge by @winglian in #624
better handling and logging of empty sharegpt turns by @winglian in #603
tweak: improve base builder for smaller layers by @maximegmd in #500
Feat(doc): Add eval_sample_packing to doc by @NanoCode012 in #625
Fix: Fail bf16 check when running on cpu during merge by @NanoCode012 in #631
default model changed by @mhenrichsen in #629
Added quotes to the pip install -e command in the documentation to fix an incompatibility … by @Nan-Do in #632
Feat: Add support for upstream FA2 by @NanoCode012 in #626
eval_table isn't quite stable enough to be in default llama configs by @winglian in #637
attention_mask not needed for training by @winglian in #642
update for recent transformers updates by @winglian in #636
use fastchat conversations template by @winglian in #578
skip some flash attn patches unless explicitly enabled by @winglian in #643
Correct typos in datasets.py by @felixonmars in #639
Fix bug in dataset loading by @ethanhs in #284
Warn users to login to HuggingFace by @Napuh in #645
Mistral flash attn packing by @winglian in #646
Fix(cfg): Add validation for save_strategy and eval_strategy by @NanoCode012 in #633
Feat: Add example for Mistral by @NanoCode012 in #644
Add mistral/README.md by @adarshxs in #647
fix for flash attn w mistral w/o sammple packing by @winglian in #648
don't strip the prompt for check since we don't strip to tokenize anymore by @winglian in #650
add support for defined train split by @winglian in #654
Fix bug when using pretokenized datasets by @ein-ich in https...

Contributors

corbt, kallewoof, and 54 other contributors

Assets 2

19 Sep 20:29

winglian

v0.3.0

772cd87

v0.3.0

What's Changed

Fix sharegpt type in doc by @NanoCode012 in #202
add support for opimum bettertransformers by @winglian in #92
Use AutoTokenizer for redpajama example by @sroecker in #209
issue #205 bugfix by @MaciejKarasek in #206
Fix tokenizing labels by @winglian in #214
add float16 docs and tweak typehints by @winglian in #212
support adamw and grad norm hyperparams by @winglian in #215
Fixing Data Readme by @msinha251 in #235
don't fail fast by @winglian in #218
better py3 support w pre-commit by @winglian in #241
optionally define whether to use_fast tokenizer by @winglian in #240
skip the system prompt by @winglian in #243
push intermediate model checkpoints to hub by @winglian in #244
System prompt data by @winglian in #224
Add cfg.push_to_hub_model_id to readme by @NanoCode012 in #252
Fix typing list in prompt tokenizer by @NanoCode012 in #249
add option for instruct w sys prompts by @winglian in #246
open orca support by @winglian in #255
update pip install command for apex by @winglian in #247
Fix future deprecation push_to_hub_model_id by @NanoCode012 in #258
[WIP] Support loading data files from a local directory by @utensil in #221
Fix(readme): local path loading and custom strategy type by @NanoCode012 in #264
don't use llama if trust_remote_code is set since that needs to use AutoModel path by @winglian in #266
params are adam_, not adamw_ by @winglian in #268
Quadratic warmup by @winglian in #271
support for loading a model by git revision by @winglian in #272
Feat(docs): Add model_revision arg by @NanoCode012 in #273
Feat: Add save_safetensors by @NanoCode012 in #275
Feat: Set push to hub as private by default by @NanoCode012 in #274
Allow non-default dataset configurations by @cg123 in #277
Feat(readme): improve docs on multi-gpu by @NanoCode012 in #279
Update requirements.txt by @teknium1 in #280
Logging update: added PID and formatting by @theobjectivedad in #276
git fetch fix for docker by @winglian in #283
misc fixes by @winglian in #286
fix axolotl training args dataclass annotation by @winglian in #287
fix(readme): remove accelerate config by @NanoCode012 in #288
add hf_transfer to requirements for faster hf upload by @winglian in #289
Fix(tokenizing): Use multi-core by @NanoCode012 in #293
Pytorch 2.0.1 by @winglian in #300
Fix(readme): Improve wording for push model by @NanoCode012 in #304
add apache 2.0 license by @winglian in #308
Flash attention 2 by @winglian in #299
don't resize embeddings to multiples of 32x by default by @winglian in #313
Add XGen info to README and example config by @ethanhs in #306
better handling since xgen tokenizer breaks with convert_tokens_to_ids by @winglian in #307
add runpod envs to .bashrc, fix bnb env by @winglian in #316
update prompts for open orca to match the paper by @winglian in #317
latest HEAD of accelerate causes 0 loss immediately w FSDP by @winglian in #321
Prune cuda117 by @winglian in #327
update README for updated docker images by @winglian in #328
fix FSDP save of final model by @winglian in #329
pin accelerate so it works with llama2 by @winglian in #330
add peft install back since it doesn't get installed by setup.py by @winglian in #331
lora/qlora w flash attention fixes by @winglian in #333
feat/llama-2 examples by @mhenrichsen in #319
update README by @tmm1 in #337
Fix flash-attn + qlora not working with llama models by @tmm1 in #336
optimize the iteration when tokenizeing large datasets by @winglian in #332
Added Orca Mini prompt strategy by @jphme in #263
Update XFormers Attention Monkeypatch to handle Llama-2 70B (GQA) by @ssmi153 in #339
add a basic ds zero3 config by @winglian in #347
experimental llama 2 chat support by @jphme in #296
ensure enable_input_require_grads is called on model before getting the peft model by @winglian in #345
set group_by_length to false in all examples by @tmm1 in #350
GPU memory usage logging by @tmm1 in #354
simplify load_model signature by @tmm1 in #356
Clarify pre-tokenize before multigpu by @NanoCode012 in #359
Update README.md on pretraining_dataset by @NanoCode012 in #360
bump to latest bitsandbytes release with major bug fixes by @tmm1 in #355
feat(merge): save tokenizer on merge by @NanoCode012 in #362
Feat: Add rope scaling by @NanoCode012 in #343
Fix(message): Improve error message for bad format by @NanoCode012 in #365
fix(model loading): warn when model revision is passed to gptq by @NanoCode012 in #364
Add wandb_entity to wandb options, update example configs, update README by @morganmcg1 in #361
fix(save): save as safetensors by @NanoCode012 in #363
Attention mask and position id fixes for packing by @winglian in #285
attempt to run non-base docker builds on regular cpu hosts by @winglian in #369
revert previous change and build ax images w docker on gpu by @winglian in #371
extract module for working with cfg by @tmm1 in #372
quiet noise from llama tokenizer by setting pad token earlier by @tmm1 in #374
improve GPU logging to break out pytorch cache and system mem by @tmm1 in #376
simplify load_tokenizer by @tmm1 in #375
fix check for flash attn branching by @w...

Contributors

tmm1, utensil, and 25 other contributors

Assets 2

13 Jun 19:19

winglian

v0.2.1

06652c1

v0.2.1

What's Changed

docker fixes: py310, fix cuda arg in deepspeed by @winglian in #115
add support for gradient accumulation steps by @winglian in #123
split up llama model loading so config can be loaded from base config and models can be loaded from a path by @winglian in #120
copy xformers attn from ooba since we removed dep on alpaca_lora_4bit by @winglian in #124
Fix(readme): Fix torch missing from readme by @NanoCode012 in #118
Add accelerate dep by @winglian in #114
Feat(inference): Swap to GenerationConfig by @NanoCode012 in #119
add py310 support from base image by @winglian in #127
add badge info to readme by @winglian in #129
fix packing so that concatenated sequences reset the attention by @winglian in #131
swap batch size for gradient accumulation steps to decouple from num gpu by @winglian in #130
fix batch size calculation by @winglian in #134
Fix: Update doc for grad_accu and add validation tests for batch size by @NanoCode012 in #135
Feat: Add lambdalabs instruction by @NanoCode012 in #141
Feat: Add custom prompt readme and add missing prompt strategies to Readme by @NanoCode012 in #142
added docker-compose file by @FarisHijazi in #146
Update README.md for correct image tags by @winglian in #147
fix device map by @winglian in #148
clone in docker by @winglian in #149
new prompters, misc fixes for output dir missing using fsdp, and changing max seq len by @winglian in #155
fix camel ai, add guanaco/oasst mapping for sharegpt by @winglian in #158
Fix: Update peft and gptq instruction by @NanoCode012 in #161
Fix: Move custom prompts out of hidden by @NanoCode012 in #162
Fix future deprecate prepare_model_for_int8_training by @NanoCode012 in #143
Feat: Set matmul tf32=True when tf32 passed by @NanoCode012 in #163
Fix: Validate falcon with fsdp by @NanoCode012 in #164
Axolotl supports falcon + qlora by @utensil in #132
Fix: Set to use cfg.seed or 42 for seed by @NanoCode012 in #166
Fix: Refactor out unmodified save_steps and eval_steps by @NanoCode012 in #167
Disable Wandb if no wandb project is specified by @bratao in #168
Feat: Improve lambda labs instruction by @NanoCode012 in #170
Fix falcon support lora by @NanoCode012 in #171
Feat: Add landmark attention by @NanoCode012 in #169
Fix backward compat for peft by @NanoCode012 in #176
Update README.md to reflect current gradient checkpointing support by @PocketDocLabs in #178
fix for max sequence len across different model types by @winglian in #179
Add streaming inference & fix stopping at EOS by @Glavin001 in #180
add support to extend context with xpos rope by @winglian in #181
fix for local variable 'LlamaForCausalLM' referenced before assignment by @winglian in #182
pass a prompt in from stdin for inference by @winglian in #183
Update FAQS.md by @akj2018 in #186
various fixes by @winglian in #189
more config pruning and migrating by @winglian in #190
Add save_steps and eval_steps to Readme by @NanoCode012 in #191
Fix config path after config moved by @NanoCode012 in #194
Fix training over existing lora by @AngainorDev in #159
config fixes by @winglian in #193
misc fixes by @winglian in #192
Fix landmark attention patch by @NanoCode012 in #177
peft no longer needs device_map by @winglian in #187
chore: Fix inference README. by @mhenrichsen in #197
Update README.md to include a community showcase by @PocketDocLabs in #200
chore: Refactor inf_kwargs out by @NanoCode012 in #199
tweak config to work by @winglian in #196

New Contributors

@FarisHijazi made their first contribution in #146
@utensil made their first contribution in #132
@bratao made their first contribution in #168
@PocketDocLabs made their first contribution in #178
@Glavin001 made their first contribution in #180
@akj2018 made their first contribution in #186
@AngainorDev made their first contribution in #159
@mhenrichsen made their first contribution in #197

Full Changelog: v0.2.0...v0.2.1

Contributors

utensil, winglian, and 8 other contributors

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What's Changed

Contributors

New Features

Sequence parallelism support via ring-flash-attn

Gemma-3 support has landed alongside several features to help you fine-tune Gemma-3 models:

Multimodal Beta support for a variety of multi-modal models:

Additional Features

Notes

What's Changed

Contributors

What's Changed

New Contributors

Contributors

What's Changed

Contributors

What's Changed

New Contributors

Contributors

What's Changed

New Contributors

Contributors

What's Changed

Contributors

New Features (highlights)

What's Changed

Contributors

What's Changed

Contributors

What's Changed

New Contributors

Contributors

Releases: axolotl-ai-cloud/axolotl

v0.8.1

What's Changed

Contributors

v0.8.0

New Features

Sequence parallelism support via ring-flash-attn

Gemma-3 support has landed alongside several features to help you fine-tune Gemma-3 models:

Multimodal Beta support for a variety of multi-modal models:

Additional Features

Notes

What's Changed

Contributors

v0.7.1

What's Changed

New Contributors

Contributors

v0.7.0

What's Changed

Contributors

v0.6.0

What's Changed

New Contributors

Contributors

v0.5.2

What's Changed

New Contributors

Contributors

v0.5.0

What's Changed

Contributors

v0.4.0

New Features (highlights)

What's Changed

Contributors

v0.3.0

What's Changed

Contributors

v0.2.1

What's Changed

New Contributors

Contributors