Editing a long session causes a long recompute #1383

Jookia · 2025-02-20T09:53:46Z

Describe the Issue
I crated a new with a bot on KoboldLite and wrote enough to go far over the context limit. Many context shifts happen and tokens are erased. I then undo a few messages and write something. It then takes a long time to compute.

I'm not exactly sure what the intended behaviour is here, or how to fix it. I'm guessing this is a natural result of the frontend passing as many tokens as it can and the context shifting, so undoing will try and prepend text at the start of the model and cause a recompute. It would be nice to work around that somehow.

Additional Information:

I use this unit file to run koboldcpp:

[Unit]
Description=koboldcpp daemon

[Service]
AmbientCapabilities=
CapabilityBoundingSet=
DeviceAllow=
DynamicUser=yes
ExecStart=koboldcpp --quiet --whispermodel whisper.gguf --ttsmodel tts.gguf --ttswavtokenizer ttswavtokenizer.gguf model.gguf
IPAddressAllow=127.0.0.1
IPAddressDeny=any
LockPersonality=yes
MemoryDenyWriteExecute=yes
PrivateDevices=yes
PrivateMounts=yes
PrivatePIDs=yes
PrivateUsers=yes
ProcSubset=pid
ProtectClock=yes
ProtectControlGroups=yes
ProtectHome=yes
ProtectHostname=yes
ProtectKernelLogs=yes
ProtectKernelModules=yes
ProtectKernelTunables=yes
ProtectProc=invisible
RemoveIPC=yes
RestrictAddressFamilies=AF_INET
RestrictNamespaces=yes
RestrictRealtime=yes
RestrictSUIDSGID=yes
SecureBits=
SystemCallArchitectures=native
SystemCallFilter=@system-service
SystemCallFilter=~@privileged
SystemCallFilter=~@resources
Type=simple
WorkingDirectory=/var/local/koboldcpp

[Install]
WantedBy=multi-user.target

I'm using the Cydonia-22B-v2k-Q4_K_M model, OuteTTS-0.3-1B-Q4_0 model, and whisper-large-v3-f16 models.
I'm using Arch Linux with an AMD Ryzen 7 3700X processor. No GPU acceleration is used.

Log and story textdata:

LOG.txt
STORY_TEXTDATA.txt

The text was updated successfully, but these errors were encountered:

LostRuins · 2025-02-20T12:39:48Z

Yes, that's how it works. The old stuff in the context is gone, so when you revert to it again it must be reprocessed.

Jookia · 2025-02-20T19:16:17Z

Is there a theoretical way the UI could compensate in some way for this? Such as having the backend tell it where it has shifted and the UI declining to send anything from that point onwards regardless of edit? It would be nice to know what's being discarded anyway.

…

On Thu, Feb 20, 2025 at 04:40:11AM -0800, LostRuins Concedo wrote: LostRuins left a comment (LostRuins/koboldcpp#1383) Yes, that's how it works. The old stuff in the context is gone, so when you revert to it again it must be reprocessed. -- Reply to this email directly or view it on GitHub: #1383 (comment) You are receiving this because you authored the thread. Message ID: ***@***.***>

LostRuins · 2025-02-21T01:44:49Z

Yes, the solution is to manually truncate your story to keep it shorter, moving the excess into other places like notes or a different file. This only happens because you exceed the max context length, so text moves out and then gets back in subsequently

MrReplikant · 2025-02-21T02:40:21Z

@LostRuins is there any way to automate that process? user script, perhaps? That'd be cool, we could even feed that to RAG of we wanted to...

-Darth

LostRuins · 2025-02-21T03:25:41Z

Doing it automatically would cause the same issues you are facing now - that the start of the context keeps changing thus causing a reprocess. Doing it manually will only cause the reprocess when you modify it.

Jookia · 2025-02-21T05:09:33Z

Manually truncating is a problem because I'd have to keep doing it and I don't know how many tokens I'm using in text. It would also mean a recompute as I fold in information in to memory or something, which would cause a recompute as it puts something at the start of the context. I'd be interesting on hearing how other people deal with this, do they just guess how many tokens they've used?

Having some kind of ratchet mechanism for shifting in UI seems like something that could work to me, where undoing doesn't move the context window back.

LostRuins · 2025-02-21T06:30:45Z

Hmm outside of a mod I don't think there's any good solution right now.

You could try add an extra function before submit_generation is called, that does something like: if total length of story exceeds max context, truncate away first half of story. That would kind of give a ratcheting effect.

Most people are fine with an occasional recompute, it only takes a few seconds on GPU at most.

Jookia · 2025-02-21T09:41:31Z

Huh, really? It takes about a minute or two on my CPU. 😅 Maybe I should buy a GPU.

LostRuins · 2025-02-21T13:21:40Z

Yes, that would greatly speed up inference.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Editing a long session causes a long recompute #1383

Editing a long session causes a long recompute #1383

Jookia commented Feb 20, 2025

LostRuins commented Feb 20, 2025

Jookia commented Feb 20, 2025 via email

LostRuins commented Feb 21, 2025

MrReplikant commented Feb 21, 2025

LostRuins commented Feb 21, 2025

Jookia commented Feb 21, 2025

LostRuins commented Feb 21, 2025

Jookia commented Feb 21, 2025

LostRuins commented Feb 21, 2025

Editing a long session causes a long recompute #1383

Editing a long session causes a long recompute #1383

Comments

Jookia commented Feb 20, 2025

LostRuins commented Feb 20, 2025

Jookia commented Feb 20, 2025 via email

LostRuins commented Feb 21, 2025

MrReplikant commented Feb 21, 2025

LostRuins commented Feb 21, 2025

Jookia commented Feb 21, 2025

LostRuins commented Feb 21, 2025

Jookia commented Feb 21, 2025

LostRuins commented Feb 21, 2025