add debug and qwen0.5b config file #402

GitMonkey0 · 2025-02-23T15:59:20Z

No description provided.

* first commit * working training * change model_id * Update scripts/training/sft.py Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> --------- Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>

* adds evals * up max model len --------- Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com>

* inital commit * with reward func * fix box extract * example line * don't break when answer malformed * command and logging * holly simplicity * move grpo * reverse readme * instructions

* Add synthetic data generation script Co-authored-by: Anton <anton-l@users.noreply.github.com> Co-authored-by: Agustin <plaguss@users.noreply.github.com> * Fix format * Fix imports sorting --------- Co-authored-by: Anton <anton-l@users.noreply.github.com> Co-authored-by: Agustin <plaguss@users.noreply.github.com>

) * Add math-verify to check accuracy of completions on GRPO * Handle make_conversation * Update src/open_r1/grpo.py Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> * Update src/open_r1/grpo.py Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> * Update src/open_r1/grpo.py Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> * fix quality * Remove unnecesary item access in parsed answer --------- Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>

* Fix slurm * Fix generate * Fix install * Fix c

* handle error in verification * command with zero2 and catch more error in verifier * Update README.md Co-authored-by: lewtun <lewis.c.tunstall@gmail.com> * deepseek distill and remove grad chekpoint * drop grad checkpoint * except --------- Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>

* handle error in verification * command with zero2 and catch more error in verifier * Update README.md Co-authored-by: lewtun <lewis.c.tunstall@gmail.com> * deepseek distill and remove grad chekpoint * drop grad checkpoint * except * copyrights --------- Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>

* Use head node ip as vLLM server url * Pass correct server url * Add num_generations argument * Fix style * Remove `select` --------- Co-authored-by: plaguss <agustin@argilla.io>

…e#214) * [Testing Github workflow] Updating workflows and makefile * [Testing Github workflow] - Refactoring workflow, fixing tests erorr, easier debugging * [Testing Github workflow] Converting docstring into raw string * [Testing Github workflow] - Fixing test_zero_max_penalty_returns_zero() test * [Testing Github workflow] Removing redundant test

…gface#280)

* fix uuid issues

…gface#294) * Enable WandB defaults to be set * Fix

VLLM has made a number of throughput improvements in version 0.7.2, so it's worth bumping the version, particularly for GRPO training runs.

* add kimi len_reward * add to REWARD_FUNCS_REGISTRY * fix formatting * Update src/open_r1/grpo.py Co-authored-by: lewtun <lewis.c.tunstall@gmail.com> * Update src/open_r1/grpo.py Co-authored-by: lewtun <lewis.c.tunstall@gmail.com> * Update src/open_r1/grpo.py Co-authored-by: lewtun <lewis.c.tunstall@gmail.com> * Update src/open_r1/rewards.py Co-authored-by: lewtun <lewis.c.tunstall@gmail.com> * Update src/open_r1/rewards.py Co-authored-by: lewtun <lewis.c.tunstall@gmail.com> * Update src/open_r1/rewards.py Co-authored-by: lewtun <lewis.c.tunstall@gmail.com> * Update src/open_r1/rewards.py Co-authored-by: lewtun <lewis.c.tunstall@gmail.com> * Update src/open_r1/rewards.py Co-authored-by: lewtun <lewis.c.tunstall@gmail.com> * missing import --------- Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>

@wraps

* [Weighted reward functions] Adding functionality to weigh rewards. Tests. * [Weighted reward functions] Adding @wraps decorator to preserve reward function metadata * style * Changing grpo.py tests to run if cuda is available * style * Apply suggestions from code review Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> --------- Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> Co-authored-by: Quentin Gallouédec <quentin.gallouedec@huggingface.co> Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com>

This reverts commit a75dc8c.

adds peft as a temp dep due to huggingface/trl#2849

…ingface#348) * Add SFT configuration for Mistral-Small-24B-Instruct-2501 model * Rename config_numina.yaml to config_openr1_math.yaml --------- Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>

…ng (huggingface#349) * Enable chat template to be configured * Add notes to README * Handle None * Remove default system prompt * Fix ST * Tune hparams * Fix * Tune * Fix

* Add lcb:codegeneration task from ligtheval * Add results from R1 Qwen 32B

* Add stuff * Make it kind of work * Add more stuff * Add fix for parse * Fix * Refactor * Clean up * Fix config * Fix sys * Add SFT config * Use min rate * Fix eval * Add base model * Add s1k * Disable eval * Fix * Add import checker * Fix importer * Fix * Tune config * Tune * Fix * Fix save * Tuen beta * Remove configs * Fix vLLM * Fix * Add note * Add doc * doc * Fix * Tune lr * Add command

* Rename solutions to solution for `len_reward` * Fix docstring for len_reward * Update src/open_r1/rewards.py Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> --------- Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com> Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>

* Fix lighteval cmd * Fix typo * Pin lighteval * Hacks to the max * Fix slurm * Fix * Pin lighteval * Pin l --------- Co-authored-by: lewis@huggingface.co <lewis@ip-26-0-160-242.ec2.internal>

…gface#392) * Pin t * Pin t * Set top p * C * Tune math prompt * Improve math prompt * Update tables

lewtun and others added 30 commits January 24, 2025 16:44

Initial commit

12c26d5

Add skeleton

bacfc9b

Add data

06a35b5

Update setup.py (huggingface#1)

bc0153f

Add configs and stuff (huggingface#2)

2e940b8

Update README.md

40b0af9

Adds Math-500 and AIME24 evals (huggingface#4)

9239a8e

* adds evals * up max model len --------- Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com>

Refactor evaluation (huggingface#6)

4cb6c95

REFACTOR TO THE MAX (huggingface#7)

d8aa42d

Fix configs

801f91f

GRPO script (huggingface#3)

fd24923

* inital commit * with reward func * fix box extract * example line * don't break when answer malformed * command and logging * holly simplicity * move grpo * reverse readme * instructions

fix slurm (huggingface#8)

bbc790f

System prompt; Fix readme command

fa0a124

use liger kernel

9a3a746

Fix generate.slurm (huggingface#10)

0bb0d13

Add SFT command to the readme (huggingface#15)

907c3f7

Pin main for transformers and trl

bb7f0b3

Add diagram (huggingface#16)

2119e75

Scale image

5309609

Fix eval comamnds (huggingface#18)

a521d23

Update README.md

cf3a72b

Fix Slurm SFT and gather Slurm scripts (huggingface#19)

592d049

* Fix slurm * Fix generate * Fix install * Fix c

fix sft.slurm

1c2b604

Update README.md

a35a582

Fix passing vLLM server URL (huggingface#21)

e75a228

* Use head node ip as vLLM server url * Pass correct server url * Add num_generations argument * Fix style * Remove `select` --------- Co-authored-by: plaguss <agustin@argilla.io>

anton-l and others added 28 commits February 10, 2025 16:53

Rename to generate_reasoning.py (huggingface#275)

e658faa

fix(sft recipes): remove duplicate packing option from config (huggin…

e07650e

…gface#280)

new grpo logic (huggingface#274)

f7d4586

Fix uuid in the data generator (huggingface#284)

34296fe

* fix uuid issues

Enable Weights & Biases defaults to be overridden in training (huggin…

0dce187

…gface#294) * Enable WandB defaults to be set * Fix

bump vllm to version to 0.7.2 (huggingface#311)

f80eb04

VLLM has made a number of throughput improvements in version 0.7.2, so it's worth bumping the version, particularly for GRPO training runs.

move details script and fix wandb logging (huggingface#314)

e2ec6ff

Fix logging import (huggingface#316)

a8f03fb

Revert "Weighted reward functions (huggingface#213)" (huggingface#317)

71a8cbb

This reverts commit a75dc8c.

Update setup.py (huggingface#315)

984db29

adds peft as a temp dep due to huggingface/trl#2849

Add SFT configuration for Mistral-Small-24B-Instruct-2501 model (hugg…

e19d5b8

…ingface#348) * Add SFT configuration for Mistral-Small-24B-Instruct-2501 model * Rename config_numina.yaml to config_openr1_math.yaml --------- Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>

Adding grpo reward args into yaml files (huggingface#337)

d6b1656

Enable chat template and system prompt to be configured during traini…

321ebf8

…ng (huggingface#349) * Enable chat template to be configured * Add notes to README * Handle None * Remove default system prompt * Fix ST * Tune hparams * Fix * Tune * Fix

Update AIME25 task configuration and registration (huggingface#344)

a5c1e41

Add LiveCodeBench's codegeneration task from lighteval (huggingface#346)

610a95b

* Add lcb:codegeneration task from ligtheval * Add results from R1 Qwen 32B

Fix LightEval commands and dependencies (huggingface#386)

911d8e7

* Fix lighteval cmd * Fix typo * Pin lighteval * Hacks to the max * Fix slurm * Fix * Pin lighteval * Pin l --------- Co-authored-by: lewis@huggingface.co <lewis@ip-26-0-160-242.ec2.internal>

Language specific code format reward (huggingface#377)

f0c25af

Pin dependencies (huggingface#393)

9ef98a9

Update prompt template and sampling parameters for evaluation (huggin…

dbdccae

…gface#392) * Pin t * Pin t * Set top p * C * Tune math prompt * Improve math prompt * Update tables

add debug and qwen0.5b config file

80eeb60

basic structure build

80b2149

Remove large file and add it to .gitignore

559d6d2

Remove large file and add it to .gitignore

1568888

GitMonkey0 closed this Feb 24, 2025

GitMonkey0 force-pushed the dev branch from 68f3da4 to 1568888 Compare February 24, 2025 08:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add debug and qwen0.5b config file #402

add debug and qwen0.5b config file #402

GitMonkey0 commented Feb 23, 2025

add debug and qwen0.5b config file #402

add debug and qwen0.5b config file #402

Conversation

GitMonkey0 commented Feb 23, 2025