Reduce size of default Roo prompt #1806
Replies: 5 comments 2 replies
-
Some further investigation, and a slight change of plan... Both these implementation proposals assume that
This would be needed because we don't want to store the instructions on how to create MCP servers etc. within the developer's working directory. They should be somewhere else (e.g. a folder in VScode globalStorage). However, A better way forward IMO would be a new tool "get_instructions" which takes a defined set of additional parameters (e.g. "create_mcp", "create_new_mode"), and provides additional instructions to the LLM on the topic in question. As well as not requiring read_file to have broader access permissions, it's also simpler to generate additional instructions dynamically, rather than writing them to the file system, as this makes it easier to provide correct values for any dynamic parts of the instructions (e.g. file paths). |
Beta Was this translation helpful? Give feedback.
-
I like this! Let me know how I can help. |
Beta Was this translation helpful? Give feedback.
-
PR #1869 implemented to cover items 1 & 3 above. Next thing to look at here is: fetch_instructions could be leveraged to make old instructions available when needed. But not clear whether these are necessary... A couple of reference points:
Looking at the history of the existing tool descriptions in Github, they were added here: 344c796, and have generally had few revisions. There is no evidence of them having been finely tuned for optimal performance. My hunch remains that we could get away with simply replacing the tool descriptions with much more condensed versions, as originally described above. There seems to be evidence (from RooFlow, and from user's describing their practise on Reddit) that this would be the case. If we wanted to establish this with greater certainty, the options I can see are: 1 - Use benchmarking to compare before and after: #689. Would hope to see same success rate, but lower cost. Would need to have some data for how the benchmark scores varies from run to run (I assume there is some stochastic variation) to asses whether any drop in performance is significant. 2 - build a bespoke test to evaluate tool use by sending a range of requests to LLMs that should trigger tool use, and checking the tool use that results. Issue with 1 is that we should be concerned about correct tool use across a range of LLMs. That suggests a lot of different benchmarking runs to test multiple LLMs. Approach 2 seems like it might be easier to run economically and efficiently over a range of different LLMs. I think worth giving some thought to a kind of "tool use benchmark" that we can use to evaluate the combination of (system prompt + LLM) in stimulating correct tool use from a given LLM. |
Beta Was this translation helpful? Give feedback.
-
This looks like a useful framework for regression testing of prompts, and evaluating tool use capabilities of different models: |
Beta Was this translation helpful? Give feedback.
-
Fully support the idea of making the system and user prompts dynamic. Pain points
Proposal 1 — Local “Prompt Analyzer”
This sharply reduces token usage and keeps contradictory, off-topic Request to the Roo Code team — “Prompt Preprocessor” APIWe need an official hook between prompt construction and the call to How it could work
Benefits
MVP
Happy to help discuss the format or test a prototype. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Related previous discussions:
#1335
#1000
#215
The default system prompt is currently around 11k tokens (per OpenAI tokenizer), and could be substantially reduced without loss of function.
Benefits of this are sustantial:
Analysis of default system prompt
Here's an analysis of the 11k tokens in the default system prompt (code mode), and some options for what we can do with them, in length order.
TOOL USE - 3.9k tokens
The bulk of this (3.2k tokens) covers descriptions & usage instructions of the 13 individual tools (250 tokens per tool). These descriptions are lengthy and can probably be optimised without loss of function.
Using an LLM, I have created compacted versions of these instructions with a combined length of 1.2k tokens (92 tokens per tool). Testing needed, but I suspect these descriptions would be adequate.
As a fallback, we could maintain the lengthier instructions in additional files that could be read by the LLM in the event it has difficulty operating a tool. Decision to be taken whether this is worthwhile, or whether there are alternative ways to mitigate the risk.
Further options would include:
MCP Servers - 3.2k tokens
The vast majority of this content concerns the creation of new MCP servers. This is a complex operation that will not be used in the vast majority of chats.
This could be moved into a separate file, with the LLM instructed to read this file only in the event the user asks to create a new MCP Server. This seems like a big & easy win.
RULES - 1.6k tokens
The default rules are quite verbose, but most LLMs need to know this information most of the time, so there's no obvious easy saving here.
I suspect that the rule text could be edited to make it more compact / less verbose, without hurting functionality. An LLM could help with this summarization. However, testing whether these changes have an impact on compliance with the rules is a fairly tricky matter.
Potential here, but gains are not as easy as they are in other sections.
MODES - 755 tokens
LLM needs to understand the different modes, so it can decide when to switch mode.
However, 500 of these tokens concern how to create new modes, which is a rare operation. This could be moved into a different file, to be loaded if/when the user asks to create a new mode.
Potentially agents that don't have access to the "switch mode" function don't need to know about modes at all?
CAPABILITIES - 647 tokens
There is substantial overlap here with tool descriptions.
However testing these changes to ensure no loss of capabilities is not a trivial matter.
SYSTEM INFORMATION -210 tokens
The bulk of this (155 tokens) is a verbose description of how to use the list of filepaths, which can probably be condensed without loss of functionality.
OBJECTIVE - 441 tokens
This text is fairly verbose. ChatGPT suggested a 120 token version that it thinks will convey the same information.
(I used this prompt to get this)
"You will be provided with a rules that explains how to behave as an LLM interacting with an IDE as an AI code editor.
We want to encode these instructions in a way that makes it clear to an LLM what to do, but does so using fewer tokens, so that we preserve the LLM's context window for other purposes.
Please provide a revised desciption of the rules which will be as concise as possible, while still giving the LLM clear unambiguous advice about how to behave."
Again, careful testing would be needed to assess whether the condensed version is still giving the behaviour we want from Roo.
Potential savings
Easy:
This takes us from 11k tokens down to 5.5k tokens.
Medium:
These changes potentially takes us down to 3k-4k tokens.
Other modes
The anaysis above focused on the "code" mode. The system prompts for other modes are very similar in length and content, and can probably be shorter
Looking forward
As further features are added to Roo Code, there will likely be pressures to add further content to the default system prompt. Once we have done work to reduce the length of the prompt, we should also consider what is needed to prevent future bloat.
It may be sufficient to have some specific additional review guidelines for any text that ends up in the system prompt to ensure that it is (a) necessary, (b) as concise as possible and (c) only included when it is needed.
Proposed next steps
As immediate next steps, I propose 3 x PRs to tackle the "easy" potential savings detailed above:
Based on learning from the implementation and test of these PRs, we can then look at the further savings described as "medium" difficulty above. I suspect that learnings from the first 3 PRs will help a lot with direction here.
Comments / feedback very welcome. I plan on working on & submitting the 3 x PRs described above, so there will also be an opportunity for detailed feedback on the specifics of those proposals in the context of those PRs.
Beta Was this translation helpful? Give feedback.
All reactions