-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Issues: huggingface/open-r1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Author
Label
Projects
Milestones
Assignee
Sort
Issues list
How many resources are required to train deepseek r1 671b using grpo?
#413
opened Feb 24, 2025 by
LiuShixing
Why does the reward keep rising while the length keeps fluctuating horizontally?
#412
opened Feb 24, 2025 by
zhufz
Evaluation : TypeError: VLLMModelConfig.__init__() got an unexpected keyword argument 'temperature:0.6'
#408
opened Feb 24, 2025 by
ChenDRAG
train/loss and rewards/format_reward are always '0.0' while running the grpo step
#405
opened Feb 23, 2025 by
yfliao
When I compute loss, I met logits=None, past_key_values=None, hidden_states=None, attentions=None
#404
opened Feb 23, 2025 by
royal-dargon
[SFT] Model trainned on Bespoke-Stratos-17k get really low performence on Math-500
#396
opened Feb 22, 2025 by
ListentoMe0112
After I trained for 500 steps, the length of think became smaller and smaller, and even disappeared.
#394
opened Feb 22, 2025 by
yaxundai
Errors in lighteval math500/gpqa evaluation about Sympify parse problem
#391
opened Feb 22, 2025 by
Larry919
evaluation error when using origin qwen due to max_model_length
#382
opened Feb 20, 2025 by
glennccc
Cannot find tasks extended|lcb:codegeneration in task list or in custom task registry
#378
opened Feb 20, 2025 by
wccccp
The kl divergence collapses but the format reward becomes larger
#373
opened Feb 19, 2025 by
yuki-younai
DeepSpeed ZeRO-3 Causes Instance Crash on Large max_completion_length in GRPO Training
#372
opened Feb 19, 2025 by
troy12x
Previous Next
ProTip!
Mix and match filters to narrow down what you’re looking for.