Releases · databricks/compose-rl

02 Jun 21:45

bowenyang008

v0.6.0

7416c48

v0.6.0 Latest

Latest

What's Changed

Added verified answers to the logging by @abaheti95 in #63
Adding GPU CI back by @dakinggg in #64
Fix args propagation by @dakinggg in #65
Fix weight propagation by @bcui-db in #66
Microbatching fixes by @dakinggg in #71
Make myself admin by @gupta-abhay in #72
Update ci-testing to latest version by @dakinggg in #70
Move generate to be done via prompt_token_ids by @bcui-db in #73
Add GRPO assert that we need more than one generation by @bcui-db in #74
Adding a Math format verifier by @gupta-abhay in #75
Ping foundry version and hash to prepare foundry upgrade by @bowenyang008 in #76
Bump to torch 2.7 by @bowenyang008 in #77

New Contributors

@bowenyang008 made their first contribution in #76

Full Changelog: v0.5.0...v0.6.0

Contributors

gupta-abhay, abaheti95, and 3 other contributors

Assets 2

15 May 04:43

gupta-abhay

v0.5.0

d65335f

v0.5.0

What's new

Online RL Algorithms: We now support PPO and GRPO for online RL training
RL with Verifiable Rewards: We've added support for verifiable rewards with online RL algorithms, along with evaluations during training.
Registries for extensible and composable design
Robust vLLM support for efficient inference during online RL training

What's Changed

Update version to match latest release by @dakinggg in #25
attach vllm engines to state by @vchiley in #20
Adding warning for truncating preferences by @bcui-db in #27
Add load planner for PPO by @bcui-db in #18
Auto set TP size by @vchiley in #29
Enable Masking of EOS tokens list by @bcui-db in #31
Accomodate typing changes for transformers 4.51 by @dakinggg in #33
Dataloader changes for RLVR by @gupta-abhay in #21
Moved the long seq fix on top of main by @abaheti95 in #34
Changes for better reward validation by @gupta-abhay in #35
Inheritance fix by @gupta-abhay in #37
Simple change by @gupta-abhay in #40
K generation per prompt by @abaheti95 in #36
Merge ReadMEs for easier parsing by @gupta-abhay in #41
Enable hf token for restricted data access by @gupta-abhay in #42
Enable different KL estimators for training by @gupta-abhay in #44
update readme by @bcui-db in #45
Upgrade yapf version by @gupta-abhay in #46
Fast inference w/ single vllm generate call per PPO iter by @abaheti95 in #43
Addressing cleanup comments on fast vLLM PR by @abaheti95 in #49
Improving online RL logging by @abaheti95 in #50
Update vLLM, enables single node Tensor parallel sizes (1, 2, 4, 8) by @bcui-db in #48
Unified kl estimators by @gupta-abhay in #53
Add codeowners by @gupta-abhay in #54
Add chat functionality to vLLM actor by @bcui-db in #55
Exposing average log prob flag by @abaheti95 in #56
Modifying codeowners by @gupta-abhay in #57
GRPO implementation by @abaheti95 in #51
Registries for extending compose-rl by @gupta-abhay in #47
Simple tests for new registries by @gupta-abhay in #58
Timeout change by @gupta-abhay in #59
Fix label generation for MATH to match verification by @gupta-abhay in #60
Changes for optional tokens list by @gupta-abhay in #61
Minor changes for dtype and docstrings by @gupta-abhay in #62

New Contributors

@vchiley made their first contribution in #20
@gupta-abhay made their first contribution in #21

Full Changelog: v0.4.0...v0.5.0

Contributors

vchiley, gupta-abhay, and 3 other contributors

Assets 2

09 Apr 06:39

dakinggg

v0.4.0

c733544

v0.4.0

What's Changed

Move non optional deps to non optional by @dakinggg in #23

Full Changelog: v0.3.0...v0.4.0

Contributors

dakinggg

Assets 2

08 Apr 23:16

irenedea

v0.3.0

9d3d63f

v0.3.0

What's Changed

Force float32 when loading transformers configs by @dakinggg in #11
Torch 2.6 Version Bump by @abaheti95 in #13
Preference RL refactor by @abaheti95 in #12
Standardized the sequence_id batch variable to match llm-foundry by @abaheti95 in #14
Standardized attention mask field in DPO, RM and finegrained preferences by @abaheti95 in #15
Updating sequence length usage by @bcui-db in #17
Separate inference engine by @bcui-db in #16
Upper bound vllm by @dakinggg in #19
Update setuptools version by @irenedea in #22

New Contributors

@dakinggg made their first contribution in #11
@abaheti95 made their first contribution in #13
@irenedea made their first contribution in #22

Full Changelog: v0.2.1...v0.3.0

Contributors

abaheti95, irenedea, and 2 other contributors

Assets 2

07 Mar 19:27

bcui-db

v0.2.1

9f8237f

v0.2.1

Cutting a new release to include cpu-release

Assets 2

06 Mar 22:02

bcui-db

v0.2.0

db44681

v0.2.0

Added support for classifiers in LLMs, fixed some tests, and updated release dependencies.

Assets 2

05 Feb 00:14

bcui-db

v0.1.0

36c7a85

v0.1.0

This is the first release of Databricks' Compose-RL, which is a library designed to streamline the integration of various reinforcement learning from human feedback (RLHF) techniques.

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

What's Changed

New Contributors

Contributors

Uh oh!

What's new

What's Changed

New Contributors

Contributors

Uh oh!

What's Changed

Contributors

Uh oh!

What's Changed

New Contributors

Contributors

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Releases: databricks/compose-rl

v0.6.0

What's Changed

New Contributors

Contributors

Uh oh!

v0.5.0

What's new

What's Changed

New Contributors

Contributors

Uh oh!

v0.4.0

What's Changed

Contributors

Uh oh!

v0.3.0

What's Changed

New Contributors

Contributors

Uh oh!

v0.2.1

Uh oh!

v0.2.0

Uh oh!

v0.1.0

Uh oh!