Releases: Lightning-AI/litgpt
Releases · Lightning-AI/litgpt
v0.5.8
Many great updates!
What's Changed
- add missing r1 prompt style by @ali-alshaar7 in #1929
- fix incremental save for PyTorch 2.6 by @t-vi in #1928
- Update pyproject.toml by @syntheticgio in #1939
- fix: resolve failing CI by @Borda in #1944
- handle wrapped thundermodules in generate by @t-vi in #1955
- fix skip condition by @t-vi in #1956
- ci: use HF cache by @Borda in #1958
- nits for CI by @Andrei-Aksionov in #1940
- ci: split HF caching by @Borda in #1960
- bump: PT 2.6 +
bitsandbytes
& standalone tests by @Borda in #1959 - prune whitespaces for code readability by @Borda in #1962
- fixing various typos in examples & tutorials by @Borda in #1963
- fix
n_query_groups
for llama-3.1-405b by @ysjprojects in #1946 - tests: make flaky test due to connection issues by @Borda in #1964
- Fix: incorrect gradient accumulation steps bug by @ysjprojects in #1947
- fix: use default
num_nodes=1
for back-compatibility by @Borda in #1967 - Do not wrap LoRA layers with FSDP by @janEbert in #1538
- Speculative decoding: Base implementation by @Andrei-Aksionov in #1938
- Better clarity on SFT dataset attributes by @ysjprojects in #1970
- Enforce Consistent Formatting and Validation for YAML Files by @Borda in #1977
- Apply Standard Formatting and Fix Import & Test Name Issues by @Borda in #1981
- Setting
config.sliding_window_layer_stride
explicity by @ysjprojects in #1972 - feat: add linear rope type by @k223kim in #1982
- feat: update tests for transformers 4.50.2 by @k223kim in #1983
- fix:
test_tokenizer_against_hf
by @Borda in #1984 - feat: replace sliding window type with offset by @k223kim in #1989
- ci: with
pull_request_target
by @Borda in #1992 - Phi4 mini by @ysjprojects in #1949
- aggregate
val_loss
by @ysjprojects in #1971 - feat: add local base freq for rope by @k223kim in #1993
- test: flexible wait for serve start by @Borda in #1996
- fix: replace sliding window configuration parameters to sliding windows indices by @k223kim in #1995
- QwQ-32B by @ysjprojects in #1952
- feat: run thunder tests as part of LitGPT CI by @deependujha in #1975
- try pyupgrade-up py38 by @Borda in #1999
- [1/4] feat: add gemma 3 27b by @k223kim in #1998
- [2/4] add gemma 3 1b by @k223kim in #2000
- feat: add gemma-3-12b by @k223kim in #2002
- [3/4] feat: add gemma 3 4b by @k223kim in #2001
- Add resume for adapter_v2, enable continued finetuning for adapter by @altria-zewei-wang in #1354
- Fix/loading gemma 3 1b by @pquadri in #2004
- feat: add gemma 3 in readme and tutorials by @k223kim in #2005
- add borda as codeowner by @t-vi in #2007
- example for full finetuning with python code by @astrobdr in #1331
- feat: add tests for gemma3 by @k223kim in #2006
- [pre-commit.ci] pre-commit suggestions by @pre-commit-ci in #2009
- building tutorials as mkdocs pages by @Borda in #2011
- Add mlflow logger support by @topikachu in #1985
- fix support for
litserve>0.2.4
by @ali-alshaar7 in #1994 - Cast tensors in KVCache only when needed by @Andrei-Aksionov in #2017
- feat: load only text weights from multimodal gemma by @pquadri in #2008
- Feature: Adds support for OpenAISpec in litgpt serve by @bhimrazy in #1943
- fix typo by @Lynsoo in #2018
- drop upper bounds in dependencies by @t-vi in #2022
- prepare 0.5.8 by @t-vi in #2023
New Contributors
- @syntheticgio made their first contribution in #1939
- @deependujha made their first contribution in #1975
- @altria-zewei-wang made their first contribution in #1354
- @pquadri made their first contribution in #2004
- @astrobdr made their first contribution in #1331
- @pre-commit-ci made their first contribution in #2009
- @topikachu made their first contribution in #1985
- @bhimrazy made their first contribution in #1943
- @Lynsoo made their first contribution in #2018
Full Changelog: v0.5.7...v0.5.8
v0.5.7
What's Changed
- Add Deepseek r1 distill llama models by @ali-alshaar7 in #1922
New Contributors
- @ali-alshaar7 made their first contribution in #1922
Full Changelog: v0.5.6...v0.5.7
v0.5.6
v0.5.5
What's Changed
- Post-release setup for 0.5.5.dev1 by @Andrei-Aksionov in #1885
- Falcon3 by @ysjprojects in #1881
- ChatML prompt template by @ysjprojects in #1882
- Small fixes and refactoring by @mseeger in #1861
- Drop interleave placement in QKV matrix by @Andrei-Aksionov in #1013
- Bump PyTorch, PyTorch-Lightning and BnB versions by @Andrei-Aksionov in #1893
- Pin version of mistune in check links workflow by @Andrei-Aksionov in #1895
- Skip converting .safetensors to .bin by @ysjprojects in #1853
- Some improvements for KV caching by @mseeger in #1891
- added query-key norm to accomodate OLMo2 by @ysjprojects in #1894
- Improve HF download speed by @rasbt in #1899
- Bump version for 0.5.5 release by @rasbt in #1901
New Contributors
Full Changelog: v0.5.4...v0.5.5
v0.5.4
What's Changed
- 0.5.3 post release setup by @rasbt in #1817
- Add cff file by @rasbt in #1818
- Deprecate Support for Dolly, Nous-Hermes, Redpajama-Incite, Vicuna and H2O Danube Models. by @ParagEkbote in #1821
- Adding OLMo by @aflah02 in #1827
- Adding Qwen2.5 by @ysjprojects in #1834
- Restore SlimPajama preprocessing code by @aflah02 in #1840
- Add QwQ-32B-Preview by @ysjprojects in #1844
- Add Mixtral-8x22B by @ysjprojects in #1845
- add Llama-3.3-70B-Instruct by @ysjprojects in #1859
- add Salamandra by @ysjprojects in #1857
- Qwen2.5: fix block size for Coder series by @ysjprojects in #1856
- fix: add missing"," by @vra in #1855
- fix llama3.3 readme url by @ysjprojects in #1862
- Set torch.load(...,
weights_only=False
) in litgpt/api.py by @Andrei-Aksionov in #1874 - Add Qwen2.5 math by @ysjprojects in #1863
- Add SmolLM2 by @ysjprojects in #1848
- Add Mistral-Large-Instruct-2411 by @ysjprojects in #1876
- Bump version for 0.5.4 release by @Andrei-Aksionov in #1883
- Temporary remove Thunder to make a release by @Andrei-Aksionov in #1884
New Contributors
- @ParagEkbote made their first contribution in #1821
- @ysjprojects made their first contribution in #1834
- @vra made their first contribution in #1855
Full Changelog: v0.5.3...v0.5.4
v0.5.3
What's Changed
- Post-release setup for 0.5.3.dev1 by @rasbt in #1799
- Add Phi 3 128k model by @deveworld in #1800
- Add token counts to compute performance by @rasbt in #1801
- Fixed the issue that precision is always "32-true". by @jianpingw in #1802
- Add Nvidia Llama 3.1 70B Nemotron weights by @rasbt in #1803
- Choose evaluation example from test set by @rasbt in #1804
- Pretrain tok sec by @rasbt in #1805
- typo in convert_to_litgpt command by @wasifferoze in #1807
- Move distributed all_reduce import into a function by @IvanYashchuk in #1810
- Remove hardcoded 32-precision conversion by @rasbt in #1814
New Contributors
- @deveworld made their first contribution in #1800
- @jianpingw made their first contribution in #1802
- @wasifferoze made their first contribution in #1807
- @IvanYashchuk made their first contribution in #1810
Full Changelog: v0.5.2...v0.5.3
v0.5.2
v0.5.1
What's Changed
- v0.5.0 post release setup by @rasbt in #1774
- Be more specific about missign RoPE parameters by @rasbt in #1781
- Use correct Llama 3.1 and 3.2 context lengths by @rasbt in #1779
- Fixing Llama 3.1 and 3.2 Maximum Context Length by @rasbt in #1782
- Use more realistic RoPE tests by @rasbt in #1785
- AMD (MI250X) support by @TensorTemplar in #1775
- Tidy up RoPE by @rasbt in #1786
- Bump version for 0.5.1 bugfix release by @rasbt in #1787
New Contributors
- @TensorTemplar made their first contribution in #1775
Full Changelog: v0.5.0...v0.5.1
v0.5.0
What's Changed
- Post 0.4.13 release set up by @rasbt in #1755
- Add missing explanation on how to use a finetuned model by @rasbt in #1756
- Bump lightning version to latest stable release (2.4.0) by @rasbt in #1765
- Improve rope by @rasbt in #1745
- Add bnb.nn.StableEmbedding for quantized training by @rasbt in #1770
- [fix][1760] Added fix for the missing
context
key issue in dolly! by @pytholic in #1766 - Fix Llama 3.2 tokenizer by @rasbt in #1772
New Contributors
Full Changelog: v0.4.13...v0.5.0
v0.4.13
What's Changed
- Make 0.4.13.dev1 version by @rasbt in #1722
- Enable MPS support for LitGPT by @rasbt in #1724
- Simplify MPS support by @rasbt in #1726
- Add Chainlit Studio by @rasbt in #1728
- Fixing the tokenizer for slimpajama data preparation by @tomaslaz in #1734
- Add pretrain conversion by @rasbt in #1735
- Typo fix and formatting improvements in API Trainer docs by @rasbt in #1736
- bump macos to m1 by @t-vi in #1725
- Improve filepath handling in unit tests by @rasbt in #1737
- Add a more informative message in case text exceeds context size by @rasbt in #1738
- Update Thunder README.md by @rasbt in #1740
- Add sliding window attention to Mistral and Phi 3 by @rasbt in #1741
- Extend context length for sliding window tests by @rasbt in #1742
- Fix jsonarparse version by @rasbt in #1748
- Update RoPE tests by @rasbt in #1746
- Make json parsing more robust by @rasbt in #1749
- Support for optimizers which don't have "fused" parameter such as grokadamw and 8bit bnb by @mtasic85 in #1744
- Increase rtol and atol in Gemma 2 for macOS by @rasbt in #1751
- Repair json files by @rasbt in #1752
- Llama 3.2 weights by @rasbt in #1750
- Bump version to 0.4.13 for new release by @rasbt in #1753
- Temporarily take out thunder dependency for deployment by @rasbt in #1754
New Contributors
Full Changelog: v0.4.12...v0.4.13