<think> tags are not removed when set to be removed if the closing </think> isn't generated in the same pass. #1359

wh33t · 2025-02-08T19:30:38Z

Describe the Issue
[think] tags are not stripped from the output even when "remove" is set on think tags IF the closing [/think] tag is not output in the same pass.

Sometimes thinking requires more than 512 tokens of output, and if the opening [think] and the closing [/think] is not delivered within the same inference pass, kcpp fails to remove the [think]block[/think] from the context window.

However they are removed properly if [think] and [/think] are delivered in the same pass.

Additional Information:
Kcpp 1.83

PS. I have intentionally used the [] (square brackets) in this post so that they show up correctly.

Also, I would like to propose added functionality:

A settable parameter "[Think] tag auto removal threshold". You set this to an integer. Let's say we set it to 1000. After 1000 tokens/words of output AFTER a closing [/think], that whole [think]block[/think] is removed. Because clearly it's important for the think block to exist to help predict the output but after a while it just eats up precious context window and vram. Also keep in mind, you wouldn't necessarily want to strip every think block in the context window after 1000 tokens/words, you'd want to just strip the earliest think block in the context window.

LostRuins · 2025-02-09T04:18:53Z

hmm it's a little tricky to take this approach. For now, you can try instead increasing the max output "number of tokens" to above 512. This is actually possible - in Lite just manually edit the value above the slider.

Note that its not advisable to increase this value beyond 25% of your max context size. So if your max context size is 8192, then you can safely set your "max output" to 2048, which should ensure that the closing </think> is captured within the same request.

Does this help?

x-legion · 2025-02-09T06:54:06Z

Hey @LostRuins, I was wondering if it might be possible to implement the s1: Simple test-time scaling technique from this paper, using this implementation, for deepseek r1 type models? It seems like it could make even smaller models perform better!

LostRuins · 2025-02-09T08:07:15Z

You could simulate it manually in story mode - when you start getting a response, delete it and replace with a "Wait" instead.

wh33t · 2025-02-09T16:56:38Z

hmm it's a little tricky to take this approach. For now, you can try instead increasing the max output "number of tokens" to above 512. This is actually possible - in Lite just manually edit the value above the slider.

Note that its not advisable to increase this value beyond 25% of your max context size. So if your max context size is 8192, then you can safely set your "max output" to 2048, which should ensure that the closing </think> is captured within the same request.

Does this help?

Aye, that will work. Or I can just keep manually removing them (feels clunky though).

Thanks!

LostRuins · 2025-03-01T09:54:40Z

Hi, this feature has been added as a toggle in the latest release

wh33t · 2025-03-01T19:02:34Z

I tested it out. Pretty impressive how not submitting the [think] blocks keeps the context nice and efficient.

I still think it would be better to somehow set a specific number of words that must exist between the closing think tag and the most recent submission from the user.

While testing this update out I encountered a weird quirk where there was about 240 tokens of think before the model began to output it's actual response, then it ran out of tokens to use, and then because I had it set so that the think tags weren't submitted, it basically started it's [think] phase all over again. This will be a recurring issue if the model runs out of tokens near the closing [/think].

Somehow telling kcpp to only incorporate the last or most recent think tags would be the best solution imo. I know it would cause delays while kcpp updates the context window but I think it would be worth it for people like me who want to use kcpp for long form writing.

Either way though, it's much better with these new features. Great work and thank you.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

<think> tags are not removed when set to be removed if the closing </think> isn't generated in the same pass. #1359

<think> tags are not removed when set to be removed if the closing </think> isn't generated in the same pass. #1359

wh33t commented Feb 8, 2025 •

edited

Loading

LostRuins commented Feb 9, 2025

x-legion commented Feb 9, 2025 •

edited

Loading

LostRuins commented Feb 9, 2025

wh33t commented Feb 9, 2025

LostRuins commented Mar 1, 2025

wh33t commented Mar 1, 2025 •

edited

Loading

<think> tags are not removed when set to be removed if the closing </think> isn't generated in the same pass. #1359

<think> tags are not removed when set to be removed if the closing </think> isn't generated in the same pass. #1359

Comments

wh33t commented Feb 8, 2025 • edited Loading

LostRuins commented Feb 9, 2025

x-legion commented Feb 9, 2025 • edited Loading

LostRuins commented Feb 9, 2025

wh33t commented Feb 9, 2025

LostRuins commented Mar 1, 2025

wh33t commented Mar 1, 2025 • edited Loading

wh33t commented Feb 8, 2025 •

edited

Loading

x-legion commented Feb 9, 2025 •

edited

Loading

wh33t commented Mar 1, 2025 •

edited

Loading