Skip to content

Proper fill-in-middle support #1386

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
May 8, 2024
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
47 changes: 41 additions & 6 deletions llama_cpp/llama.py
Original file line number Diff line number Diff line change
Expand Up @@ -955,18 +955,53 @@ def _create_completion(

completion_id: str = f"cmpl-{str(uuid.uuid4())}"
created: int = int(time.time())
prefix_token_id: int = int(self.metadata.get("tokenizer.ggml.prefix_token_id", self._model.token_prefix()))
middle_token_id: int = int(self.metadata.get("tokenizer.ggml.middle_token_id", self._model.token_middle()))
suffix_token_id: int = int(self.metadata.get("tokenizer.ggml.suffix_token_id", self._model.token_suffix()))
# If prompt is empty, initialize completion with BOS token to avoid
# detokenization including a space at the beginning of the completion
completion_tokens: List[int] = [] if len(prompt) > 0 else [self.token_bos()]
# Add blank space to start of prompt to match OG llama tokenizer
Copy link
Contributor Author

@CISC CISC Apr 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What blank space does this comment refer to?

Also, later on in this method I see prompt_tokens[1:] and comments about removing BOS, but BOS is explictly skipped, so is this actually an attempt to skip the "blank space"?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's to avoid a bug when generating from a completely empty prompt that causes a leading space to always be added but I think it may actually be incorrect for non-llama tokenizers, I'll have to check.

prompt_tokens: List[int] = (
(
self.tokenize(prompt.encode("utf-8"), special=True)
if prompt != ""
else [self.token_bos()]
[prefix_token_id]
if prefix_token_id >= 0 and suffix is not None
else []
)
+
(
(
self.tokenize(prompt.encode("utf-8"), add_bos=(prefix_token_id < 0 or suffix is None), special=(prefix_token_id < 0 or suffix is None))
if prompt != ""
else (
[]
if prefix_token_id >= 0 and suffix is not None
else [self.token_bos()]
)
)
if isinstance(prompt, str)
else prompt
)
+
(
(
[suffix_token_id]
+
(
self.tokenize(suffix.encode("utf-8"), add_bos=False, special=False)
if suffix
else []
)
)
if suffix_token_id >= 0 and suffix is not None
else []
)
+
(
[middle_token_id]
if middle_token_id >= 0 and suffix is not None
else []
)
if isinstance(prompt, str)
else prompt
)
text: bytes = b""
returned_tokens: int = 0
Expand Down Expand Up @@ -1346,7 +1381,7 @@ def logit_bias_processor(
if echo:
text_str = prompt + text_str

if suffix is not None:
if suffix_token_id < 0 and suffix is not None:
text_str = text_str + suffix

logprobs_or_none: Optional[CompletionLogprobs] = None
Expand Down