Skip to content

Commit 6711f4c

Browse files
nFunctorEduard Balzin
authored andcommitted
[Frontend] Make beam search emulator temperature modifiable (vllm-project#8928)
Co-authored-by: Eduard Balzin <nfunctor@yahoo.fr> Signed-off-by: Alvant <alvasian@yandex.ru>
1 parent ebb8213 commit 6711f4c

File tree

1 file changed

+3
-1
lines changed

1 file changed

+3
-1
lines changed

vllm/entrypoints/llm.py

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -396,6 +396,7 @@ def beam_search(
396396
beam_width: int,
397397
max_tokens: int,
398398
ignore_eos: bool = False,
399+
temperature: float = 0.0,
399400
) -> List[BeamSearchOutput]:
400401
"""
401402
Generate sequences using beam search.
@@ -405,6 +406,7 @@ def beam_search(
405406
of token IDs.
406407
beam_width: The number of beams to keep at each step.
407408
max_tokens: The max number of tokens to generate for each prompt.
409+
temperature: The temperature to use for generation.
408410
409411
TODO: how does beam search work together with length penalty, frequency
410412
penalty, and stopping criteria, etc.?
@@ -416,7 +418,7 @@ def beam_search(
416418
# at https://github.com/huggingface/transformers/blob/e15687fffe5c9d20598a19aeab721ae0a7580f8a/src/transformers/generation/beam_search.py#L534 # noqa
417419
beam_search_params = SamplingParams(logprobs=2 * beam_width,
418420
max_tokens=1,
419-
temperature=0.0)
421+
temperature=temperature)
420422
instances: List[BeamSearchInstance] = []
421423

422424
for prompt in prompts:

0 commit comments

Comments
 (0)