Releases: databricks/compose-rl
Releases · databricks/compose-rl
v0.6.0
What's Changed
- Added verified answers to the logging by @abaheti95 in #63
- Adding GPU CI back by @dakinggg in #64
- Fix args propagation by @dakinggg in #65
- Fix weight propagation by @bcui-db in #66
- Microbatching fixes by @dakinggg in #71
- Make myself admin by @gupta-abhay in #72
- Update ci-testing to latest version by @dakinggg in #70
- Move generate to be done via
prompt_token_ids
by @bcui-db in #73 - Add GRPO assert that we need more than one generation by @bcui-db in #74
- Adding a Math format verifier by @gupta-abhay in #75
- Ping foundry version and hash to prepare foundry upgrade by @bowenyang008 in #76
- Bump to torch 2.7 by @bowenyang008 in #77
New Contributors
- @bowenyang008 made their first contribution in #76
Full Changelog: v0.5.0...v0.6.0
v0.5.0
What's new
- Online RL Algorithms: We now support PPO and GRPO for online RL training
- RL with Verifiable Rewards: We've added support for verifiable rewards with online RL algorithms, along with evaluations during training.
- Registries for extensible and composable design
- Robust vLLM support for efficient inference during online RL training
What's Changed
- Update version to match latest release by @dakinggg in #25
- attach vllm engines to state by @vchiley in #20
- Adding warning for truncating preferences by @bcui-db in #27
- Add load planner for PPO by @bcui-db in #18
- Auto set TP size by @vchiley in #29
- Enable Masking of EOS tokens list by @bcui-db in #31
- Accomodate typing changes for transformers 4.51 by @dakinggg in #33
- Dataloader changes for RLVR by @gupta-abhay in #21
- Moved the long seq fix on top of main by @abaheti95 in #34
- Changes for better reward validation by @gupta-abhay in #35
- Inheritance fix by @gupta-abhay in #37
- Simple change by @gupta-abhay in #40
- K generation per prompt by @abaheti95 in #36
- Merge ReadMEs for easier parsing by @gupta-abhay in #41
- Enable hf token for restricted data access by @gupta-abhay in #42
- Enable different KL estimators for training by @gupta-abhay in #44
- update readme by @bcui-db in #45
- Upgrade yapf version by @gupta-abhay in #46
- Fast inference w/ single vllm generate call per PPO iter by @abaheti95 in #43
- Addressing cleanup comments on fast vLLM PR by @abaheti95 in #49
- Improving online RL logging by @abaheti95 in #50
- Update vLLM, enables single node Tensor parallel sizes (1, 2, 4, 8) by @bcui-db in #48
- Unified kl estimators by @gupta-abhay in #53
- Add codeowners by @gupta-abhay in #54
- Add
chat
functionality to vLLM actor by @bcui-db in #55 - Exposing average log prob flag by @abaheti95 in #56
- Modifying codeowners by @gupta-abhay in #57
- GRPO implementation by @abaheti95 in #51
- Registries for extending compose-rl by @gupta-abhay in #47
- Simple tests for new registries by @gupta-abhay in #58
- Timeout change by @gupta-abhay in #59
- Fix label generation for MATH to match verification by @gupta-abhay in #60
- Changes for optional tokens list by @gupta-abhay in #61
- Minor changes for dtype and docstrings by @gupta-abhay in #62
New Contributors
- @vchiley made their first contribution in #20
- @gupta-abhay made their first contribution in #21
Full Changelog: v0.4.0...v0.5.0
v0.4.0
v0.3.0
What's Changed
- Force float32 when loading transformers configs by @dakinggg in #11
- Torch 2.6 Version Bump by @abaheti95 in #13
- Preference RL refactor by @abaheti95 in #12
- Standardized the
sequence_id
batch variable to match llm-foundry by @abaheti95 in #14 - Standardized attention mask field in DPO, RM and finegrained preferences by @abaheti95 in #15
- Updating sequence length usage by @bcui-db in #17
- Separate inference engine by @bcui-db in #16
- Upper bound vllm by @dakinggg in #19
- Update setuptools version by @irenedea in #22
New Contributors
- @dakinggg made their first contribution in #11
- @abaheti95 made their first contribution in #13
- @irenedea made their first contribution in #22
Full Changelog: v0.2.1...v0.3.0