Skip to content

Feature request: Option to disable auto adding BOS token (double BOS token) if it's already present/added. #917

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
Spacellary opened this issue Jun 11, 2024 · 6 comments

Comments

@Spacellary
Copy link

Spacellary commented Jun 11, 2024

How to disable this automatic behavior? And if it's not possible yet, can we get a --flag for it?

llama_tokenize_internal: Added a BOS token to the prompt as specified by the model but the prompt also starts with a BOS token.

Running into this with Llama-3-8B models.

Related PR:
ggml-org#7332

@LostRuins
Copy link
Owner

How are you getting this? KoboldCpp automatically adds a BOS token at the start of the prompt, you don't have to add your own.

@Spacellary
Copy link
Author

The model was converted to GGUF using the original configs from its own repo:

https://huggingface.co/Sao10K/L3-8B-Stheno-v3.2

https://huggingface.co/Sao10K/L3-8B-Stheno-v3.2/blob/main/tokenizer_config.json#L2052

@Spacellary
Copy link
Author

Spacellary commented Jun 13, 2024

Just so models don't have to be reconverted, after manually removing the bos_token from the tokenizer_config.json template, would it be possible to control/handle this behavior, regarding the automatic addition of the bos_token?

This does not help at all.

Similar situation from abetlen/llama-cpp-python.

@Spacellary Spacellary changed the title Possible to disable auto adding BOS token (double BOS token)? [Feature] Option to disable auto adding BOS token (double BOS token) if it's already present/added. Jun 14, 2024
@Spacellary Spacellary changed the title [Feature] Option to disable auto adding BOS token (double BOS token) if it's already present/added. Feature request: Option to disable auto adding BOS token (double BOS token) if it's already present/added. Jun 14, 2024
@Spacellary
Copy link
Author

Spacellary commented Jun 14, 2024

@LostRuins – I should clarify for transparency/investigation:

How are you getting this? KoboldCpp automatically adds a BOS token at the start of the prompt, you don't have to add your own.

Using the latest release of KCPP as my inference backend on Windows 11, CuBLAS, connected to SillyTavern where I am interacting with the model/character.

If you believe this should be handled upstream, please let me know, honestly I'm not sure myself.

@LostRuins
Copy link
Owner

Hmm okay so the issue is, let's say I manually edit the prompt to remove the first BOS token if the user adds another one. What if they add 2 BOS tokens instead? Or what if they actually want to have 2,3, or more BOS tokens? Changing the BOS behavior based on what they send in the prompt seems kind of finnicky - either the backend should add a BOS automatically or it shouldn't at all - then the frontend can expect consistent behavior.

Fortunately, this doesn't actually seem to be an issue - having a double BOS in the prompt does not seem to negatively impact output quality at all, the first one is just ignored.

@Spacellary
Copy link
Author

Spacellary commented Jun 14, 2024

What if they add 2 BOS tokens instead? Or what if they actually want to have 2,3, or more BOS tokens?

This would be optional, of course. But I also agree that the outputs should be consistent for the sake of the frontends.

Fortunately, this doesn't actually seem to be an issue - having a double BOS in the prompt does not seem to negatively impact output quality at all, the first one is just ignored.

I was wondering about that since I didn't notice any issues other than the new warning as it was added upstream, but it makes people think something is wrong.

From the user side, it looks like either the model or the backend is doing something incorrectly. Considering that even after manually changing the models chat_template this warning still persists, I'm not sure.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants