Skip to content

Proper fill-in-middle support #1386

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
May 8, 2024
Merged

Conversation

CISC
Copy link
Contributor

@CISC CISC commented Apr 26, 2024

Use prefix/middle/suffix tokens when metadata is present in GGUF, like f.ex. in this one.

Use prefix/middle/suffix tokens when metadata is present in GGUF, like f.ex. in [this](https://huggingface.co/CISCai/CodeQwen1.5-7B-Chat-SOTA-GGUF) one.
@CISC
Copy link
Contributor Author

CISC commented Apr 26, 2024

See this PR on how to add this metadata to GGUFs that do not have them (recent CodeLlama and CodeGemma GGUFs should have gotten them automatically on conversion).

@CISC
Copy link
Contributor Author

CISC commented Apr 26, 2024

@abetlen BTW, I initially thought of making prompt_tokens a generator expression to be able to use chain() instead of basically copying the whole token list, but since it's referenced as a list so many places I opted to go for the less invasive method instead.

@@ -940,18 +940,47 @@ def _create_completion(

completion_id: str = f"cmpl-{str(uuid.uuid4())}"
created: int = int(time.time())
prefix_token_id: Optional[int] = self.metadata.get("tokenizer.ggml.prefix_token_id")
middle_token_id: Optional[int] = self.metadata.get("tokenizer.ggml.middle_token_id")
suffix_token_id: Optional[int] = self.metadata.get("tokenizer.ggml.suffix_token_id")
# If prompt is empty, initialize completion with BOS token to avoid
# detokenization including a space at the beginning of the completion
completion_tokens: List[int] = [] if len(prompt) > 0 else [self.token_bos()]
# Add blank space to start of prompt to match OG llama tokenizer
Copy link
Contributor Author

@CISC CISC Apr 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What blank space does this comment refer to?

Also, later on in this method I see prompt_tokens[1:] and comments about removing BOS, but BOS is explictly skipped, so is this actually an attempt to skip the "blank space"?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's to avoid a bug when generating from a completely empty prompt that causes a leading space to always be added but I think it may actually be incorrect for non-llama tokenizers, I'll have to check.

CISC added 2 commits April 27, 2024 18:03
In some cases llama.cpp will make a guess at fim tokens, use them if there's no metadata.
@abetlen
Copy link
Owner

abetlen commented Apr 28, 2024

@CISC thank you for implementing this. And yeah lot's of moving pieces in this method (honestly very long overdue for a refactor) so avoiding doing anything fancy is ideal.

Note: add_bos is misnamed, it's actually add_special and can cause several special tokens to be added to the token list (the special parameter is actually parse_special).
CISC added 2 commits May 7, 2024 13:43
I've left original behavior when no fim tokens are found, but this should perhaps be re-evaluated.
@CISC CISC requested a review from abetlen May 7, 2024 12:13
@abetlen
Copy link
Owner

abetlen commented May 8, 2024

@CISC great contribution thank you so much, merging now!

@abetlen abetlen merged commit 4a7122d into abetlen:main May 8, 2024
16 checks passed
@CISC CISC deleted the fill-in-middle-support branch May 8, 2024 06:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants