-
Notifications
You must be signed in to change notification settings - Fork 472
Feature request: Option to disable auto adding BOS token (double BOS token) if it's already present/added. #917
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
How are you getting this? KoboldCpp automatically adds a BOS token at the start of the prompt, you don't have to add your own. |
The model was converted to GGUF using the original configs from its own repo: https://huggingface.co/Sao10K/L3-8B-Stheno-v3.2 https://huggingface.co/Sao10K/L3-8B-Stheno-v3.2/blob/main/tokenizer_config.json#L2052 |
This does not help at all. |
@LostRuins – I should clarify for transparency/investigation:
Using the latest release of KCPP as my inference backend on Windows 11, CuBLAS, connected to SillyTavern where I am interacting with the model/character. If you believe this should be handled upstream, please let me know, honestly I'm not sure myself. |
Hmm okay so the issue is, let's say I manually edit the prompt to remove the first BOS token if the user adds another one. What if they add 2 BOS tokens instead? Or what if they actually want to have 2,3, or more BOS tokens? Changing the BOS behavior based on what they send in the prompt seems kind of finnicky - either the backend should add a BOS automatically or it shouldn't at all - then the frontend can expect consistent behavior. Fortunately, this doesn't actually seem to be an issue - having a double BOS in the prompt does not seem to negatively impact output quality at all, the first one is just ignored. |
This would be optional, of course. But I also agree that the outputs should be consistent for the sake of the frontends.
I was wondering about that since I didn't notice any issues other than the new warning as it was added upstream, but it makes people think something is wrong. From the user side, it looks like either the model or the backend is doing something incorrectly. Considering that even after manually changing the models chat_template this warning still persists, I'm not sure. |
Uh oh!
There was an error while loading. Please reload this page.
How to disable this automatic behavior? And if it's not possible yet, can we get a --flag for it?
llama_tokenize_internal: Added a BOS token to the prompt as specified by the model but the prompt also starts with a BOS token.
Running into this with Llama-3-8B models.
Related PR:
ggml-org#7332
The text was updated successfully, but these errors were encountered: