You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I tested a fixed sequence of prompts that needs a context size of almost 4k on a mistral 7b instruct model. Only one slot, no de-fragmentated kv cache.
When using a 4k context size, the sequence lasts 3 minutes. When using 32k context size, the same sequence lasts 11minutes.
Why does an increased but not used context size has such an impact on the performance?
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
I tested a fixed sequence of prompts that needs a context size of almost 4k on a mistral 7b instruct model. Only one slot, no de-fragmentated kv cache.
When using a 4k context size, the sequence lasts 3 minutes. When using 32k context size, the same sequence lasts 11minutes.
Why does an increased but not used context size has such an impact on the performance?
Beta Was this translation helpful? Give feedback.
All reactions