What's Changed
- add additional tf32 opt for cudnn by @winglian in #2477
- fix(example): align example to correct adapter by @NanoCode012 in #2478
- removing deepspeed guard for LoRA Triton kernels by @djsaunde in #2480
- check if fixture exists in the cache already by @winglian in #2485
- simplify the example configs to be more minimal and less daunting by @winglian in #2486
- fix: cohere cce scaling wrong tensor by @NanoCode012 in #2483
- fix tokenizer overrides w gemma3 by @winglian in #2488
- Update dependencies and show slow tests in CI by @winglian in #2492
- Flex Attention + Packing with BlockMask support by @bursteratom in #2363
- FSDP2 support by @winglian in #2469
- llama4 support by @winglian in #2493
- feat: add llama4 multimodal by @NanoCode012 in #2499
- fix: duplicate llama4 chattemplate enum by @NanoCode012 in #2500
- fix(doc): clarify roles mapping in chat_template by @NanoCode012 in #2490
- Feat: Add doc on loading datasets and support for Azure/OCI by @NanoCode012 in #2482
- SP cu_seqlens fix, refactor by @djsaunde in #2495
- feat: add llama4 CCE by @NanoCode012 in #2498
- Llama4 linearized by @winglian in #2502
Full Changelog: v0.8.0...v0.8.1