Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add debug and qwen0.5b config file #402

Closed
wants to merge 126 commits into from
Closed

Conversation

GitMonkey0
Copy link

No description provided.

lewtun and others added 30 commits January 24, 2025 16:44
* first commit

* working training

* change model_id

* Update scripts/training/sft.py

Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>

---------

Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
* adds evals

* up max model len

---------

Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com>
* inital commit

* with reward func

* fix box extract

* example line

* don't break when answer malformed

* command and logging

* holly simplicity

* move grpo

* reverse readme

* instructions
* Add synthetic data generation script

Co-authored-by: Anton <anton-l@users.noreply.github.com>
Co-authored-by: Agustin <plaguss@users.noreply.github.com>

* Fix format

* Fix imports sorting

---------

Co-authored-by: Anton <anton-l@users.noreply.github.com>
Co-authored-by: Agustin <plaguss@users.noreply.github.com>
)

* Add math-verify to check accuracy of completions on GRPO

* Handle make_conversation

* Update src/open_r1/grpo.py

Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>

* Update src/open_r1/grpo.py

Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>

* Update src/open_r1/grpo.py

Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>

* fix quality

* Remove unnecesary item access in parsed answer

---------

Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
* Fix slurm

* Fix generate

* Fix install

* Fix c
* handle error in verification

* command with zero2 and catch more error in verifier

* Update README.md

Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>

* deepseek distill and remove grad chekpoint

* drop grad checkpoint

* except

---------

Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>
* handle error in verification

* command with zero2 and catch more error in verifier

* Update README.md

Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>

* deepseek distill and remove grad chekpoint

* drop grad checkpoint

* except

* copyrights

---------

Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>
* Use head node ip as vLLM server url

* Pass correct server url

* Add num_generations argument

* Fix style

* Remove `select`

---------

Co-authored-by: plaguss <agustin@argilla.io>
anton-l and others added 28 commits February 10, 2025 16:53
…e#214)

* [Testing Github workflow] Updating workflows and makefile

* [Testing Github workflow] - Refactoring workflow, fixing tests erorr, easier debugging

* [Testing Github workflow] Converting docstring into raw string

* [Testing Github workflow] - Fixing test_zero_max_penalty_returns_zero() test

* [Testing Github workflow] Removing redundant test
VLLM has made a number of throughput improvements in version 0.7.2, so it's worth bumping the version, particularly for GRPO training runs.
* add kimi len_reward

* add to REWARD_FUNCS_REGISTRY

* fix formatting

* Update src/open_r1/grpo.py

Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>

* Update src/open_r1/grpo.py

Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>

* Update src/open_r1/grpo.py

Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>

* Update src/open_r1/rewards.py

Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>

* Update src/open_r1/rewards.py

Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>

* Update src/open_r1/rewards.py

Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>

* Update src/open_r1/rewards.py

Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>

* Update src/open_r1/rewards.py

Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>

* missing import

---------

Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>
* [Weighted reward functions] Adding functionality to weigh rewards. Tests.

* [Weighted reward functions] Adding @wraps decorator to preserve reward function metadata

* style

* Changing grpo.py tests to run if cuda is available

* style

* Apply suggestions from code review

Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>

---------

Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Co-authored-by: Quentin Gallouédec <quentin.gallouedec@huggingface.co>
Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com>
…ingface#348)

* Add SFT configuration for Mistral-Small-24B-Instruct-2501 model

* Rename config_numina.yaml to config_openr1_math.yaml

---------

Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>
…ng (huggingface#349)

* Enable chat template to be configured

* Add notes to README

* Handle None

* Remove default system prompt

* Fix ST

* Tune hparams

* Fix

* Tune

* Fix
* Add lcb:codegeneration task from ligtheval

* Add results from R1 Qwen 32B
* Add stuff

* Make it kind of work

* Add more stuff

* Add fix for parse

* Fix

* Refactor

* Clean up

* Fix config

* Fix sys

* Add SFT config

* Use min rate

* Fix eval

* Add base model

* Add s1k

* Disable eval

* Fix

* Add import checker

* Fix importer

* Fix

* Tune config

* Tune

* Fix

* Fix save

* Tuen beta

* Remove configs

* Fix vLLM

* Fix

* Add note

* Add doc

* doc

* Fix

* Tune lr

* Add command
* Rename solutions to solution for `len_reward`

* Fix docstring for len_reward

* Update src/open_r1/rewards.py

Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>

---------

Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com>
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
* Fix lighteval cmd

* Fix typo

* Pin lighteval

* Hacks to the max

* Fix slurm

* Fix

* Pin lighteval

* Pin l

---------

Co-authored-by: lewis@huggingface.co <lewis@ip-26-0-160-242.ec2.internal>
…gface#392)

* Pin t

* Pin t

* Set top p

* C

* Tune math prompt

* Improve math prompt

* Update tables
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.