Why is the result variable when using vllm for inference, even though I’ve set the temperature to 0.01? The result should be fixed, right? #409

hellen9527 · 2025-02-24T02:40:51Z

I directly used vllm to load the model saved after training with GRPO, and during inference, I used one data point from the training set. The prompt format is the same, and the temperature is set to 0.01, but I noticed that the result is different each time. After about 10 runs, it starts repeating. Is this an issue with the GRPO training mechanism, or could it be a problem with the saved model?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why is the result variable when using vllm for inference, even though I’ve set the temperature to 0.01? The result should be fixed, right? #409

Why is the result variable when using vllm for inference, even though I’ve set the temperature to 0.01? The result should be fixed, right? #409

hellen9527 commented Feb 24, 2025

Why is the result variable when using vllm for inference, even though I’ve set the temperature to 0.01? The result should be fixed, right? #409

Why is the result variable when using vllm for inference, even though I’ve set the temperature to 0.01? The result should be fixed, right? #409

Comments

hellen9527 commented Feb 24, 2025