huggingface / open-r1 Public

Notifications You must be signed in to change notification settings
Fork 1.9k
Star 21.3k

Code
Issues 164
Pull requests 35
Discussions
Actions
Projects
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Discussions
Actions
Projects
Security
Insights

Issues: huggingface/open-r1

How to contribute

#23 opened Jan 25, 2025 by lewtun

Open 9

Labels 11 Milestones 0

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

164 Open 55 Closed

Author

Filter by author

Label

Filter by label

Use alt + click/return to exclude labels

or ⇧ + click/return for logical OR

Projects

Filter by project

Milestones

Filter by milestone

Assignee

Filter by who’s assigned

Assigned to nobody

Sort

Sort by

Newest Oldest Most commented Least commented Recently updated Least recently updated Best match

Most reactions

Issues list

How many resources are required to train deepseek r1 671b using grpo?

#413 opened Feb 24, 2025 by LiuShixing

Why does the reward keep rising while the length keeps fluctuating horizontally?

#412 opened Feb 24, 2025 by zhufz

Stuck at lighteval "COMPUTING METRICS"

#411 opened Feb 24, 2025 by chenjingming99

GRPO trainer error

#410 opened Feb 24, 2025 by xiangxinhello

Why is the result variable when using vllm for inference, even though I’ve set the temperature to 0.01? The result should be fixed, right?

#409 opened Feb 24, 2025 by hellen9527

Evaluation : TypeError: VLLMModelConfig.__init__() got an unexpected keyword argument 'temperature:0.6'

#408 opened Feb 24, 2025 by ChenDRAG

GRPO reward hacking

#407 opened Feb 24, 2025 by Kidand

train/loss and rewards/format_reward are always '0.0' while running the grpo step

#405 opened Feb 23, 2025 by yfliao

When I compute loss, I met logits=None, past_key_values=None, hidden_states=None, attentions=None

#404 opened Feb 23, 2025 by royal-dargon

Instead of rising steadily, the reward fluctuates wildly

#403 opened Feb 23, 2025 by Hasuer

Error with boardcast completion ids

#400 opened Feb 23, 2025 by hhnqqq

[SFT] Model trainned on Bespoke-Stratos-17k get really low performence on Math-500

#396 opened Feb 22, 2025 by ListentoMe0112

less strict format reward function is necessary for early training and the default system prompt is not appropriate for answers like '<think>...<answer>'

#395 opened Feb 22, 2025 by jssuncx

After I trained for 500 steps, the length of think became smaller and smaller, and even disappeared.

#394 opened Feb 22, 2025 by yaxundai

Errors in lighteval math500/gpqa evaluation about Sympify parse problem

#391 opened Feb 22, 2025 by Larry919

About the sampler problems.

#390 opened Feb 21, 2025 by linkedlist771

sft with multiple GPUs got stuck

#389 opened Feb 21, 2025 by SharonJin422

Issue with training 7B model

#383 opened Feb 20, 2025 by rqzhangberkeley

evaluation error when using origin qwen due to max_model_length

#382 opened Feb 20, 2025 by glennccc

how to set sampling parameters when do evaluation

#381 opened Feb 20, 2025 by ItGirls

How to set cuda device for your Data generation pipline

#380 opened Feb 20, 2025 by Aristo23333

Cannot find tasks extended|lcb:codegeneration in task list or in custom task registry

#378 opened Feb 20, 2025 by wccccp

Error Encountered During livecodebench eval Execution

#375 opened Feb 19, 2025 by JackLingjie

The kl divergence collapses but the format reward becomes larger

#373 opened Feb 19, 2025 by yuki-younai

DeepSpeed ZeRO-3 Causes Instance Crash on Large max_completion_length in GRPO Training

#372 opened Feb 19, 2025 by troy12x

Previous 1 2 3 4 5 6 7 Next

Previous Next

ProTip! Mix and match filters to narrow down what you’re looking for.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly