Many great updates!
What's Changed
- add missing r1 prompt style by @ali-alshaar7 in #1929
- fix incremental save for PyTorch 2.6 by @t-vi in #1928
- Update pyproject.toml by @syntheticgio in #1939
- fix: resolve failing CI by @Borda in #1944
- handle wrapped thundermodules in generate by @t-vi in #1955
- fix skip condition by @t-vi in #1956
- ci: use HF cache by @Borda in #1958
- nits for CI by @Andrei-Aksionov in #1940
- ci: split HF caching by @Borda in #1960
- bump: PT 2.6 +
bitsandbytes
& standalone tests by @Borda in #1959 - prune whitespaces for code readability by @Borda in #1962
- fixing various typos in examples & tutorials by @Borda in #1963
- fix
n_query_groups
for llama-3.1-405b by @ysjprojects in #1946 - tests: make flaky test due to connection issues by @Borda in #1964
- Fix: incorrect gradient accumulation steps bug by @ysjprojects in #1947
- fix: use default
num_nodes=1
for back-compatibility by @Borda in #1967 - Do not wrap LoRA layers with FSDP by @janEbert in #1538
- Speculative decoding: Base implementation by @Andrei-Aksionov in #1938
- Better clarity on SFT dataset attributes by @ysjprojects in #1970
- Enforce Consistent Formatting and Validation for YAML Files by @Borda in #1977
- Apply Standard Formatting and Fix Import & Test Name Issues by @Borda in #1981
- Setting
config.sliding_window_layer_stride
explicity by @ysjprojects in #1972 - feat: add linear rope type by @k223kim in #1982
- feat: update tests for transformers 4.50.2 by @k223kim in #1983
- fix:
test_tokenizer_against_hf
by @Borda in #1984 - feat: replace sliding window type with offset by @k223kim in #1989
- ci: with
pull_request_target
by @Borda in #1992 - Phi4 mini by @ysjprojects in #1949
- aggregate
val_loss
by @ysjprojects in #1971 - feat: add local base freq for rope by @k223kim in #1993
- test: flexible wait for serve start by @Borda in #1996
- fix: replace sliding window configuration parameters to sliding windows indices by @k223kim in #1995
- QwQ-32B by @ysjprojects in #1952
- feat: run thunder tests as part of LitGPT CI by @deependujha in #1975
- try pyupgrade-up py38 by @Borda in #1999
- [1/4] feat: add gemma 3 27b by @k223kim in #1998
- [2/4] add gemma 3 1b by @k223kim in #2000
- feat: add gemma-3-12b by @k223kim in #2002
- [3/4] feat: add gemma 3 4b by @k223kim in #2001
- Add resume for adapter_v2, enable continued finetuning for adapter by @altria-zewei-wang in #1354
- Fix/loading gemma 3 1b by @pquadri in #2004
- feat: add gemma 3 in readme and tutorials by @k223kim in #2005
- add borda as codeowner by @t-vi in #2007
- example for full finetuning with python code by @astrobdr in #1331
- feat: add tests for gemma3 by @k223kim in #2006
- [pre-commit.ci] pre-commit suggestions by @pre-commit-ci in #2009
- building tutorials as mkdocs pages by @Borda in #2011
- Add mlflow logger support by @topikachu in #1985
- fix support for
litserve>0.2.4
by @ali-alshaar7 in #1994 - Cast tensors in KVCache only when needed by @Andrei-Aksionov in #2017
- feat: load only text weights from multimodal gemma by @pquadri in #2008
- Feature: Adds support for OpenAISpec in litgpt serve by @bhimrazy in #1943
- fix typo by @Lynsoo in #2018
- drop upper bounds in dependencies by @t-vi in #2022
- prepare 0.5.8 by @t-vi in #2023
New Contributors
- @syntheticgio made their first contribution in #1939
- @deependujha made their first contribution in #1975
- @altria-zewei-wang made their first contribution in #1354
- @pquadri made their first contribution in #2004
- @astrobdr made their first contribution in #1331
- @pre-commit-ci made their first contribution in #2009
- @topikachu made their first contribution in #1985
- @bhimrazy made their first contribution in #1943
- @Lynsoo made their first contribution in #2018
Full Changelog: v0.5.7...v0.5.8