-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Proper fill-in-middle support #1386
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Use prefix/middle/suffix tokens when metadata is present in GGUF, like f.ex. in [this](https://huggingface.co/CISCai/CodeQwen1.5-7B-Chat-SOTA-GGUF) one.
See this PR on how to add this metadata to GGUFs that do not have them (recent CodeLlama and CodeGemma GGUFs should have gotten them automatically on conversion). |
@abetlen BTW, I initially thought of making prompt_tokens a generator expression to be able to use chain() instead of basically copying the whole token list, but since it's referenced as a list so many places I opted to go for the less invasive method instead. |
@@ -940,18 +940,47 @@ def _create_completion( | |||
|
|||
completion_id: str = f"cmpl-{str(uuid.uuid4())}" | |||
created: int = int(time.time()) | |||
prefix_token_id: Optional[int] = self.metadata.get("tokenizer.ggml.prefix_token_id") | |||
middle_token_id: Optional[int] = self.metadata.get("tokenizer.ggml.middle_token_id") | |||
suffix_token_id: Optional[int] = self.metadata.get("tokenizer.ggml.suffix_token_id") | |||
# If prompt is empty, initialize completion with BOS token to avoid | |||
# detokenization including a space at the beginning of the completion | |||
completion_tokens: List[int] = [] if len(prompt) > 0 else [self.token_bos()] | |||
# Add blank space to start of prompt to match OG llama tokenizer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What blank space does this comment refer to?
Also, later on in this method I see prompt_tokens[1:] and comments about removing BOS, but BOS is explictly skipped, so is this actually an attempt to skip the "blank space"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's to avoid a bug when generating from a completely empty prompt that causes a leading space to always be added but I think it may actually be incorrect for non-llama tokenizers, I'll have to check.
In some cases llama.cpp will make a guess at fim tokens, use them if there's no metadata.
@CISC thank you for implementing this. And yeah lot's of moving pieces in this method (honestly very long overdue for a refactor) so avoiding doing anything fancy is ideal. |
Note: add_bos is misnamed, it's actually add_special and can cause several special tokens to be added to the token list (the special parameter is actually parse_special).
I've left original behavior when no fim tokens are found, but this should perhaps be re-evaluated.
@CISC great contribution thank you so much, merging now! |
Use prefix/middle/suffix tokens when metadata is present in GGUF, like f.ex. in this one.