-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add debug and qwen0.5b config file #402
Closed
Closed
+0
−0
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
* first commit * working training * change model_id * Update scripts/training/sft.py Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> --------- Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
* adds evals * up max model len --------- Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com>
* inital commit * with reward func * fix box extract * example line * don't break when answer malformed * command and logging * holly simplicity * move grpo * reverse readme * instructions
* Add synthetic data generation script Co-authored-by: Anton <anton-l@users.noreply.github.com> Co-authored-by: Agustin <plaguss@users.noreply.github.com> * Fix format * Fix imports sorting --------- Co-authored-by: Anton <anton-l@users.noreply.github.com> Co-authored-by: Agustin <plaguss@users.noreply.github.com>
) * Add math-verify to check accuracy of completions on GRPO * Handle make_conversation * Update src/open_r1/grpo.py Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> * Update src/open_r1/grpo.py Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> * Update src/open_r1/grpo.py Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> * fix quality * Remove unnecesary item access in parsed answer --------- Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
* Fix slurm * Fix generate * Fix install * Fix c
* handle error in verification * command with zero2 and catch more error in verifier * Update README.md Co-authored-by: lewtun <lewis.c.tunstall@gmail.com> * deepseek distill and remove grad chekpoint * drop grad checkpoint * except --------- Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>
* handle error in verification * command with zero2 and catch more error in verifier * Update README.md Co-authored-by: lewtun <lewis.c.tunstall@gmail.com> * deepseek distill and remove grad chekpoint * drop grad checkpoint * except * copyrights --------- Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>
* Use head node ip as vLLM server url * Pass correct server url * Add num_generations argument * Fix style * Remove `select` --------- Co-authored-by: plaguss <agustin@argilla.io>
…e#214) * [Testing Github workflow] Updating workflows and makefile * [Testing Github workflow] - Refactoring workflow, fixing tests erorr, easier debugging * [Testing Github workflow] Converting docstring into raw string * [Testing Github workflow] - Fixing test_zero_max_penalty_returns_zero() test * [Testing Github workflow] Removing redundant test
* fix uuid issues
…gface#294) * Enable WandB defaults to be set * Fix
VLLM has made a number of throughput improvements in version 0.7.2, so it's worth bumping the version, particularly for GRPO training runs.
* add kimi len_reward * add to REWARD_FUNCS_REGISTRY * fix formatting * Update src/open_r1/grpo.py Co-authored-by: lewtun <lewis.c.tunstall@gmail.com> * Update src/open_r1/grpo.py Co-authored-by: lewtun <lewis.c.tunstall@gmail.com> * Update src/open_r1/grpo.py Co-authored-by: lewtun <lewis.c.tunstall@gmail.com> * Update src/open_r1/rewards.py Co-authored-by: lewtun <lewis.c.tunstall@gmail.com> * Update src/open_r1/rewards.py Co-authored-by: lewtun <lewis.c.tunstall@gmail.com> * Update src/open_r1/rewards.py Co-authored-by: lewtun <lewis.c.tunstall@gmail.com> * Update src/open_r1/rewards.py Co-authored-by: lewtun <lewis.c.tunstall@gmail.com> * Update src/open_r1/rewards.py Co-authored-by: lewtun <lewis.c.tunstall@gmail.com> * missing import --------- Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>
* [Weighted reward functions] Adding functionality to weigh rewards. Tests. * [Weighted reward functions] Adding @wraps decorator to preserve reward function metadata * style * Changing grpo.py tests to run if cuda is available * style * Apply suggestions from code review Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> --------- Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> Co-authored-by: Quentin Gallouédec <quentin.gallouedec@huggingface.co> Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com>
This reverts commit a75dc8c.
adds peft as a temp dep due to huggingface/trl#2849
…ingface#348) * Add SFT configuration for Mistral-Small-24B-Instruct-2501 model * Rename config_numina.yaml to config_openr1_math.yaml --------- Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>
…ng (huggingface#349) * Enable chat template to be configured * Add notes to README * Handle None * Remove default system prompt * Fix ST * Tune hparams * Fix * Tune * Fix
* Add lcb:codegeneration task from ligtheval * Add results from R1 Qwen 32B
* Add stuff * Make it kind of work * Add more stuff * Add fix for parse * Fix * Refactor * Clean up * Fix config * Fix sys * Add SFT config * Use min rate * Fix eval * Add base model * Add s1k * Disable eval * Fix * Add import checker * Fix importer * Fix * Tune config * Tune * Fix * Fix save * Tuen beta * Remove configs * Fix vLLM * Fix * Add note * Add doc * doc * Fix * Tune lr * Add command
* Rename solutions to solution for `len_reward` * Fix docstring for len_reward * Update src/open_r1/rewards.py Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> --------- Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com> Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
* Fix lighteval cmd * Fix typo * Pin lighteval * Hacks to the max * Fix slurm * Fix * Pin lighteval * Pin l --------- Co-authored-by: lewis@huggingface.co <lewis@ip-26-0-160-242.ec2.internal>
…gface#392) * Pin t * Pin t * Set top p * C * Tune math prompt * Improve math prompt * Update tables
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.