You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I directly used vllm to load the model saved after training with GRPO, and during inference, I used one data point from the training set. The prompt format is the same, and the temperature is set to 0.01, but I noticed that the result is different each time. After about 10 runs, it starts repeating. Is this an issue with the GRPO training mechanism, or could it be a problem with the saved model?
The text was updated successfully, but these errors were encountered:
I directly used vllm to load the model saved after training with GRPO, and during inference, I used one data point from the training set. The prompt format is the same, and the temperature is set to 0.01, but I noticed that the result is different each time. After about 10 runs, it starts repeating. Is this an issue with the GRPO training mechanism, or could it be a problem with the saved model?
The text was updated successfully, but these errors were encountered: