Proper fill-in-middle support #1386

CISC · 2024-04-26T12:10:42Z

Use prefix/middle/suffix tokens when metadata is present in GGUF, like f.ex. in this one.

Use prefix/middle/suffix tokens when metadata is present in GGUF, like f.ex. in [this](https://huggingface.co/CISCai/CodeQwen1.5-7B-Chat-SOTA-GGUF) one.

CISC · 2024-04-26T12:23:38Z

See this PR on how to add this metadata to GGUFs that do not have them (recent CodeLlama and CodeGemma GGUFs should have gotten them automatically on conversion).

CISC · 2024-04-26T12:45:57Z

@abetlen BTW, I initially thought of making prompt_tokens a generator expression to be able to use chain() instead of basically copying the whole token list, but since it's referenced as a list so many places I opted to go for the less invasive method instead.

CISC · 2024-04-27T08:39:13Z

llama_cpp/llama.py

@@ -940,18 +940,47 @@ def _create_completion(

        completion_id: str = f"cmpl-{str(uuid.uuid4())}"
        created: int = int(time.time())
+        prefix_token_id: Optional[int] = self.metadata.get("tokenizer.ggml.prefix_token_id")
+        middle_token_id: Optional[int] = self.metadata.get("tokenizer.ggml.middle_token_id")
+        suffix_token_id: Optional[int] = self.metadata.get("tokenizer.ggml.suffix_token_id")
        # If prompt is empty, initialize completion with BOS token to avoid
        # detokenization including a space at the beginning of the completion
        completion_tokens: List[int] = [] if len(prompt) > 0 else [self.token_bos()]
        # Add blank space to start of prompt to match OG llama tokenizer


What blank space does this comment refer to?

Also, later on in this method I see prompt_tokens[1:] and comments about removing BOS, but BOS is explictly skipped, so is this actually an attempt to skip the "blank space"?

It's to avoid a bug when generating from a completely empty prompt that causes a leading space to always be added but I think it may actually be incorrect for non-llama tokenizers, I'll have to check.

In some cases llama.cpp will make a guess at fim tokens, use them if there's no metadata.

abetlen · 2024-04-28T01:00:20Z

@CISC thank you for implementing this. And yeah lot's of moving pieces in this method (honestly very long overdue for a refactor) so avoiding doing anything fancy is ideal.

Note: add_bos is misnamed, it's actually add_special and can cause several special tokens to be added to the token list (the special parameter is actually parse_special).

llama_cpp/llama.py

I've left original behavior when no fim tokens are found, but this should perhaps be re-evaluated.

abetlen · 2024-05-08T06:26:12Z

@CISC great contribution thank you so much, merging now!

Proper fill-in-middle support

a2017ea

Use prefix/middle/suffix tokens when metadata is present in GGUF, like f.ex. in [this](https://huggingface.co/CISCai/CodeQwen1.5-7B-Chat-SOTA-GGUF) one.

CISC commented Apr 27, 2024

View reviewed changes

CISC added 2 commits April 27, 2024 18:03

fall back to internal prefix/middle/suffix id

b65e2e2

In some cases llama.cpp will make a guess at fim tokens, use them if there's no metadata.

typo--

b016daf

don't insert special tokens that are not there in suffix

fddf319

Note: add_bos is misnamed, it's actually add_special and can cause several special tokens to be added to the token list (the special parameter is actually parse_special).

CISC commented Apr 30, 2024

View reviewed changes

llama_cpp/llama.py Outdated Show resolved Hide resolved

CISC added 2 commits May 7, 2024 13:43

don't add/parse any special tokens when using fim

8dc1dda

I've left original behavior when no fim tokens are found, but this should perhaps be re-evaluated.

don't append suffix to prompt_tokens unless fim tokens are detected

408fcad

CISC requested a review from abetlen May 7, 2024 12:13

CISC and others added 2 commits May 7, 2024 15:02

make sure we only do this for fim

62b2d4e

Merge branch 'main' into fill-in-middle-support

cb3a116

abetlen merged commit 4a7122d into abetlen:main May 8, 2024
16 checks passed

CISC deleted the fill-in-middle-support branch May 8, 2024 06:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Proper fill-in-middle support #1386

Proper fill-in-middle support #1386

Uh oh!

CISC commented Apr 26, 2024

Uh oh!

CISC commented Apr 26, 2024

Uh oh!

CISC commented Apr 26, 2024

Uh oh!

CISC Apr 27, 2024 •

edited

Loading

Uh oh!

abetlen Apr 28, 2024

Uh oh!

abetlen commented Apr 28, 2024

Uh oh!

Uh oh!

abetlen commented May 8, 2024

Uh oh!

Uh oh!

Uh oh!

Proper fill-in-middle support #1386

Proper fill-in-middle support #1386

Uh oh!

Conversation

CISC commented Apr 26, 2024

Uh oh!

CISC commented Apr 26, 2024

Uh oh!

CISC commented Apr 26, 2024

Uh oh!

CISC Apr 27, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

abetlen Apr 28, 2024

Choose a reason for hiding this comment

Uh oh!

abetlen commented Apr 28, 2024

Uh oh!

Uh oh!

abetlen commented May 8, 2024

Uh oh!

Uh oh!

Uh oh!

CISC Apr 27, 2024 •

edited

Loading