You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
suffix: A suffix to append to the generated text. If None, no suffix is appended.
1452
1449
max_tokens: The maximum number of tokens to generate. If max_tokens <= 0 or None, the maximum number of tokens to generate is unlimited and depends on n_ctx.
1453
-
min_tokens: The minimum number of tokens to generate.
1450
+
min_tokens: The minimum number of tokens to generate. It may return fewer tokens if another condition is met (e.g. max_tokens, stop).
1454
1451
temperature: The temperature to use for sampling.
1455
1452
top_p: The top-p value to use for nucleus sampling. Nucleus sampling described in academic paper "The Curious Case of Neural Text Degeneration" https://arxiv.org/abs/1904.09751
1456
1453
min_p: The min-p value to use for minimum p sampling. Minimum P sampling as described in https://github.com/ggerganov/llama.cpp/pull/3841
@@ -1520,7 +1517,7 @@ def __call__(
1520
1517
prompt: str,
1521
1518
suffix: Optional[str] =None,
1522
1519
max_tokens: Optional[int] =16,
1523
-
min_tokens: Optional[int]=1,
1520
+
min_tokens: int=0,
1524
1521
temperature: float=0.8,
1525
1522
top_p: float=0.95,
1526
1523
min_p: float=0.05,
@@ -1550,7 +1547,7 @@ def __call__(
1550
1547
prompt: The prompt to generate text from.
1551
1548
suffix: A suffix to append to the generated text. If None, no suffix is appended.
1552
1549
max_tokens: The maximum number of tokens to generate. If max_tokens <= 0 or None, the maximum number of tokens to generate is unlimited and depends on n_ctx.
1553
-
min_tokens: The minimum number of tokens to generate.
1550
+
min_tokens: The minimum number of tokens to generate. It may return fewer tokens if another condition is met (e.g. max_tokens, stop).
1554
1551
temperature: The temperature to use for sampling.
1555
1552
top_p: The top-p value to use for nucleus sampling. Nucleus sampling described in academic paper "The Curious Case of Neural Text Degeneration" https://arxiv.org/abs/1904.09751
1556
1553
min_p: The min-p value to use for minimum p sampling. Minimum P sampling as described in https://github.com/ggerganov/llama.cpp/pull/3841
response_format: The response format to use for the chat completion. Use { "type": "json_object" } to contstrain output to only valid json.
1664
1661
max_tokens: The maximum number of tokens to generate. If max_tokens <= 0 or None, the maximum number of tokens to generate is unlimited and depends on n_ctx.
1665
-
min_tokens: The minimum number of tokens to generate.
1662
+
min_tokens: The minimum number of tokens to generate. It may return fewer tokens if another condition is met (e.g. max_tokens, stop).
1666
1663
presence_penalty: The penalty to apply to tokens based on their presence in the prompt.
1667
1664
frequency_penalty: The penalty to apply to tokens based on their frequency in the prompt.
1668
1665
repeat_penalty: The penalty to apply to repeated tokens.
Copy file name to clipboardExpand all lines: llama_cpp/server/types.py
+3-3Lines changed: 3 additions & 3 deletions
Original file line number
Diff line number
Diff line change
@@ -17,7 +17,7 @@
17
17
)
18
18
19
19
min_tokens_field=Field(
20
-
default=1, ge=1, description="The minimum number of tokens to generate."
20
+
default=0, ge=0, description="The minimum number of tokens to generate. It may return fewer tokens if another condition is met (e.g. max_tokens, stop)."
21
21
)
22
22
23
23
temperature_field=Field(
@@ -117,7 +117,7 @@ class CreateCompletionRequest(BaseModel):
117
117
max_tokens: Optional[int] =Field(
118
118
default=16, ge=0, description="The maximum number of tokens to generate."
119
119
)
120
-
min_tokens: Optional[int]=min_tokens_field
120
+
min_tokens: int=min_tokens_field
121
121
temperature: float=temperature_field
122
122
top_p: float=top_p_field
123
123
min_p: float=min_p_field
@@ -213,7 +213,7 @@ class CreateChatCompletionRequest(BaseModel):
213
213
default=None,
214
214
description="The maximum number of tokens to generate. Defaults to inf",
215
215
)
216
-
min_tokens: Optional[int]=min_tokens_field
216
+
min_tokens: int=min_tokens_field
217
217
logprobs: Optional[bool] =Field(
218
218
default=False,
219
219
description="Whether to output the logprobs or not. Default is True"
0 commit comments