-
Notifications
You must be signed in to change notification settings - Fork 433
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Editing a long session causes a long recompute #1383
Comments
Yes, that's how it works. The old stuff in the context is gone, so when you revert to it again it must be reprocessed. |
Is there a theoretical way the UI could compensate in some way for this?
Such as having the backend tell it where it has shifted and the UI declining to
send anything from that point onwards regardless of edit? It would be nice to
know what's being discarded anyway.
…On Thu, Feb 20, 2025 at 04:40:11AM -0800, LostRuins Concedo wrote:
LostRuins left a comment (LostRuins/koboldcpp#1383)
Yes, that's how it works. The old stuff in the context is gone, so when you revert to it again it must be reprocessed.
--
Reply to this email directly or view it on GitHub:
#1383 (comment)
You are receiving this because you authored the thread.
Message ID: ***@***.***>
|
Yes, the solution is to manually truncate your story to keep it shorter, moving the excess into other places like notes or a different file. This only happens because you exceed the max context length, so text moves out and then gets back in subsequently |
@LostRuins is there any way to automate that process? user script, perhaps? That'd be cool, we could even feed that to RAG of we wanted to... -Darth |
Doing it automatically would cause the same issues you are facing now - that the start of the context keeps changing thus causing a reprocess. Doing it manually will only cause the reprocess when you modify it. |
Manually truncating is a problem because I'd have to keep doing it and I don't know how many tokens I'm using in text. It would also mean a recompute as I fold in information in to memory or something, which would cause a recompute as it puts something at the start of the context. I'd be interesting on hearing how other people deal with this, do they just guess how many tokens they've used? Having some kind of ratchet mechanism for shifting in UI seems like something that could work to me, where undoing doesn't move the context window back. |
Hmm outside of a mod I don't think there's any good solution right now. You could try add an extra function before submit_generation is called, that does something like: if total length of story exceeds max context, truncate away first half of story. That would kind of give a ratcheting effect. Most people are fine with an occasional recompute, it only takes a few seconds on GPU at most. |
Huh, really? It takes about a minute or two on my CPU. 😅 Maybe I should buy a GPU. |
Yes, that would greatly speed up inference. |
Describe the Issue
I crated a new with a bot on KoboldLite and wrote enough to go far over the context limit. Many context shifts happen and tokens are erased. I then undo a few messages and write something. It then takes a long time to compute.
I'm not exactly sure what the intended behaviour is here, or how to fix it. I'm guessing this is a natural result of the frontend passing as many tokens as it can and the context shifting, so undoing will try and prepend text at the start of the model and cause a recompute. It would be nice to work around that somehow.
Additional Information:
I use this unit file to run koboldcpp:
I'm using the Cydonia-22B-v2k-Q4_K_M model, OuteTTS-0.3-1B-Q4_0 model, and whisper-large-v3-f16 models.
I'm using Arch Linux with an AMD Ryzen 7 3700X processor. No GPU acceleration is used.
Log and story textdata:
LOG.txt
STORY_TEXTDATA.txt
The text was updated successfully, but these errors were encountered: