-
Notifications
You must be signed in to change notification settings - Fork 434
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
<think> tags are not removed when set to be removed if the closing </think> isn't generated in the same pass. #1359
Comments
hmm it's a little tricky to take this approach. For now, you can try instead increasing the max output "number of tokens" to above 512. This is actually possible - in Lite just manually edit the value above the slider. Note that its not advisable to increase this value beyond 25% of your max context size. So if your max context size is 8192, then you can safely set your "max output" to 2048, which should ensure that the closing Does this help? |
Hey @LostRuins, I was wondering if it might be possible to implement the s1: Simple test-time scaling technique from this paper, using this implementation, for deepseek r1 type models? It seems like it could make even smaller models perform better! |
You could simulate it manually in story mode - when you start getting a response, delete it and replace with a "Wait" instead. |
Aye, that will work. Or I can just keep manually removing them (feels clunky though). Thanks! |
Hi, this feature has been added as a toggle in the latest release |
I tested it out. Pretty impressive how not submitting the [think] blocks keeps the context nice and efficient. I still think it would be better to somehow set a specific number of words that must exist between the closing think tag and the most recent submission from the user. While testing this update out I encountered a weird quirk where there was about 240 tokens of think before the model began to output it's actual response, then it ran out of tokens to use, and then because I had it set so that the think tags weren't submitted, it basically started it's [think] phase all over again. This will be a recurring issue if the model runs out of tokens near the closing [/think]. Somehow telling kcpp to only incorporate the last or most recent think tags would be the best solution imo. I know it would cause delays while kcpp updates the context window but I think it would be worth it for people like me who want to use kcpp for long form writing. Either way though, it's much better with these new features. Great work and thank you. |
Describe the Issue
[think] tags are not stripped from the output even when "remove" is set on think tags IF the closing [/think] tag is not output in the same pass.
Sometimes thinking requires more than 512 tokens of output, and if the opening [think] and the closing [/think] is not delivered within the same inference pass, kcpp fails to remove the [think]block[/think] from the context window.
However they are removed properly if [think] and [/think] are delivered in the same pass.
Additional Information:
Kcpp 1.83
PS. I have intentionally used the [] (square brackets) in this post so that they show up correctly.
Also, I would like to propose added functionality:
The text was updated successfully, but these errors were encountered: