Reduce size of default Roo prompt #1806

diarmidmackenzie · 2025-03-19T10:33:01Z

diarmidmackenzie
Mar 19, 2025

Related previous discussions:
#1335
#1000
#215

The default system prompt is currently around 11k tokens (per OpenAI tokenizer), and could be substantially reduced without loss of function.

Benefits of this are sustantial:

reduced costs to users
more context available for LLMs to do useful things
reduced environmental impact.

Analysis of default system prompt

Here's an analysis of the 11k tokens in the default system prompt (code mode), and some options for what we can do with them, in length order.

TOOL USE - 3.9k tokens

The bulk of this (3.2k tokens) covers descriptions & usage instructions of the 13 individual tools (250 tokens per tool). These descriptions are lengthy and can probably be optimised without loss of function.

Using an LLM, I have created compacted versions of these instructions with a combined length of 1.2k tokens (92 tokens per tool). Testing needed, but I suspect these descriptions would be adequate.

As a fallback, we could maintain the lengthier instructions in additional files that could be read by the LLM in the event it has difficulty operating a tool. Decision to be taken whether this is worthwhile, or whether there are alternative ways to mitigate the risk.

Further options would include:

reduce the tool descriptions to just descriptions of function without usage instructions, and have the LLM load the usage instructions from a separate file only when using the tool. Not clear the additional savings are worth it, once we have got working tool descriptions down to 1.2k tokens.
only include tool descriptions for tools that are available to the LLM (based on user config). This seems like an obvious easy win, and worth doing. Are there any issues with enabling capabilities on an LLM mid-chat? Might need some care here.

MCP Servers - 3.2k tokens

The vast majority of this content concerns the creation of new MCP servers. This is a complex operation that will not be used in the vast majority of chats.

This could be moved into a separate file, with the LLM instructed to read this file only in the event the user asks to create a new MCP Server. This seems like a big & easy win.

RULES - 1.6k tokens

The default rules are quite verbose, but most LLMs need to know this information most of the time, so there's no obvious easy saving here.
I suspect that the rule text could be edited to make it more compact / less verbose, without hurting functionality. An LLM could help with this summarization. However, testing whether these changes have an impact on compliance with the rules is a fairly tricky matter.

Potential here, but gains are not as easy as they are in other sections.

MODES - 755 tokens

LLM needs to understand the different modes, so it can decide when to switch mode.
However, 500 of these tokens concern how to create new modes, which is a rare operation. This could be moved into a different file, to be loaded if/when the user asks to create a new mode.

Potentially agents that don't have access to the "switch mode" function don't need to know about modes at all?

CAPABILITIES - 647 tokens

There is substantial overlap here with tool descriptions.

potentially some text could be consolidated
savings could be made by not describing capabilities where the underlying tool is not available. (implementation of this might be simplified by consolidating text into the tool descriptions).

However testing these changes to ensure no loss of capabilities is not a trivial matter.

SYSTEM INFORMATION -210 tokens

The bulk of this (155 tokens) is a verbose description of how to use the list of filepaths, which can probably be condensed without loss of functionality.

OBJECTIVE - 441 tokens

This text is fairly verbose. ChatGPT suggested a 120 token version that it thinks will convey the same information.

(I used this prompt to get this)
"You will be provided with a rules that explains how to behave as an LLM interacting with an IDE as an AI code editor.

We want to encode these instructions in a way that makes it clear to an LLM what to do, but does so using fewer tokens, so that we preserve the LLM's context window for other purposes.

Please provide a revised desciption of the rules which will be as concise as possible, while still giving the LLM clear unambiguous advice about how to behave."

Again, careful testing would be needed to assess whether the condensed version is still giving the behaviour we want from Roo.

Potential savings

Easy:

3k tokens by moving description of how to create MCP servers to a separate file.
2k tokens by simplifying descriptions of tool use.
0.5k tokens by moving description of how to create new modes

This takes us from 11k tokens down to 5.5k tokens.

Medium:

0 to 1.2k tokens: Only load tool descriptions for tools the LLM can use.
250 tokens.. Only describe "modes" to agents that can switch modes.
?800 tokens Refinement and consolidation of Capabilities, System Information and Objective text.

These changes potentially takes us down to 3k-4k tokens.

Other modes

The anaysis above focused on the "code" mode. The system prompts for other modes are very similar in length and content, and can probably be shorter

Since some modes have reduced capabilities, adjusting the system prompt based on available capabilities will help to a certain extent here.
It may also be possible to make further gains by reviewing the sub-sections of the system prompt in the context of each mode, and putting in place mode-specific versions of each of the sections. Gains here will be modest compared to the gains detailed above, as they will only affect a single mode, rather than being global. So while there is potential here, this is probably not a immediate priority, relative to items above.

Looking forward

As further features are added to Roo Code, there will likely be pressures to add further content to the default system prompt. Once we have done work to reduce the length of the prompt, we should also consider what is needed to prevent future bloat.

It may be sufficient to have some specific additional review guidelines for any text that ends up in the system prompt to ensure that it is (a) necessary, (b) as concise as possible and (c) only included when it is needed.

Proposed next steps

As immediate next steps, I propose 3 x PRs to tackle the "easy" potential savings detailed above:

Moving description of how to create MCP servers to a separate file.
Simplify descriptions of tool use (TBC as part of that PR whether the old descriptions need to be made available to be read as a fallback).
Move description of how to create new modes to a separate file.

Based on learning from the implementation and test of these PRs, we can then look at the further savings described as "medium" difficulty above. I suspect that learnings from the first 3 PRs will help a lot with direction here.

Comments / feedback very welcome. I plan on working on & submitting the 3 x PRs described above, so there will also be an opportunity for detailed feedback on the specifics of those proposals in the context of those PRs.

diarmidmackenzie · 2025-03-19T13:15:36Z

diarmidmackenzie
Mar 19, 2025
Author

Some further investigation, and a slight change of plan...

Both these implementation proposals assume that read_file can access files outside of the project working directory.

Moving description of how to create MCP servers to a separate file.
Move description of how to create new modes to a separate file.

This would be needed because we don't want to store the instructions on how to create MCP servers etc. within the developer's working directory. They should be somewhere else (e.g. a folder in VScode globalStorage).

However, read_file only takes a relative path, and can't read files outside the project working directory. That seems like a wise restrictiion, and probably not one we should look to relax.

A better way forward IMO would be a new tool "get_instructions" which takes a defined set of additional parameters (e.g. "create_mcp", "create_new_mode"), and provides additional instructions to the LLM on the topic in question.

As well as not requiring read_file to have broader access permissions, it's also simpler to generate additional instructions dynamically, rather than writing them to the file system, as this makes it easier to provide correct values for any dynamic parts of the instructions (e.g. file paths).

0 replies

mrubens · 2025-03-19T13:18:42Z

mrubens
Mar 19, 2025
Maintainer

I like this! Let me know how I can help.

1 reply

diarmidmackenzie Mar 21, 2025
Author

Hi @mrubens - I have a draft PR here. I haven't submitted for approval yet because there are some known gaps with i18n to fill. I also have some concerns about test coverage.

#1869

Could you let me know if this is on the right track, and what changes you think I should make before I submit for review?

diarmidmackenzie · 2025-03-27T18:17:30Z

diarmidmackenzie
Mar 27, 2025
Author

PR #1869 implemented to cover items 1 & 3 above.

Next thing to look at here is:
"Simplify descriptions of tool use (TBC as part of that PR whether the old descriptions need to be made available to be read as a fallback)."

fetch_instructions could be leveraged to make old instructions available when needed. But not clear whether these are necessary...

A couple of reference points:

RooFlow uses very concise tool descriptions, and seems to work fine:
https://github.com/GreatScottyMac/RooFlow/blob/main/config/.roo/system-prompt-code
(it uses much more concise instructions elsewhere too)
See this Reddit comment...
https://www.reddit.com/r/RooCode/s/la3OkTgbvP
"Copy the system prompt of one mode, paste in 3.7 thinking. Ask it to shorten by 30% while preserving syntax and rules. It will shorten it to 20% in practice. Create new equivalent mode and paste it on custom system prompt. Repeat for all modes."
(so basically saying that an LLM-shortened prompt of 20% the length is still effective).

Looking at the history of the existing tool descriptions in Github, they were added here: 344c796, and have generally had few revisions. There is no evidence of them having been finely tuned for optimal performance.

My hunch remains that we could get away with simply replacing the tool descriptions with much more condensed versions, as originally described above. There seems to be evidence (from RooFlow, and from user's describing their practise on Reddit) that this would be the case.

If we wanted to establish this with greater certainty, the options I can see are:

1 - Use benchmarking to compare before and after: #689. Would hope to see same success rate, but lower cost. Would need to have some data for how the benchmark scores varies from run to run (I assume there is some stochastic variation) to asses whether any drop in performance is significant.

2 - build a bespoke test to evaluate tool use by sending a range of requests to LLMs that should trigger tool use, and checking the tool use that results.

Issue with 1 is that we should be concerned about correct tool use across a range of LLMs. That suggests a lot of different benchmarking runs to test multiple LLMs. Approach 2 seems like it might be easier to run economically and efficiently over a range of different LLMs.

I think worth giving some thought to a kind of "tool use benchmark" that we can use to evaluate the combination of (system prompt + LLM) in stimulating correct tool use from a given LLM.

0 replies

diarmidmackenzie · 2025-03-28T10:57:49Z

diarmidmackenzie
Mar 28, 2025
Author

This looks like a useful framework for regression testing of prompts, and evaluating tool use capabilities of different models:
https://github.com/promptfoo/promptfoo

1 reply

BradKML May 28, 2025

Would love to know more + how tools like Task Master and Memory Banks can help with these, cus even Qwen/DeepSeek are hard to steer nowadays, and probably need some guide on managing that

OleynikAleksandr · 2025-06-04T08:26:30Z

OleynikAleksandr
Jun 4, 2025

Fully support the idea of making the system and user prompts dynamic.

Pain points

In real projects, the static blocks (huge system prompt + generic user
instructions) often consume more than 20 k tokens.
History-condensing helps, but the static part keeps growing with every
new mode and tool description.

Proposal 1 — Local “Prompt Analyzer”

Runs locally, with no extra LLM call and no extra tokens.
For each user query:
1. Parse the query → extract intents / keywords.
2. Score relevance of fragments from
  – the system prompt,
  – user / project instructions
  (simple heuristics or a small local embedding model, e.g. Ollama).
3. Assemble the final prompt from the top-N relevant chunks within the
  token budget.
Cache relevance results for similar queries to avoid recomputing.

This sharply reduces token usage and keeps contradictory, off-topic
instructions away from the model.

Request to the Roo Code team — “Prompt Preprocessor” API

We need an official hook between prompt construction and the call to
the AI provider, so the community can plug in custom processing.

How it could work

Roo Code calls an external script, passing JSON
{ "model": "...", "prompt": "...", "tokens_so_far": N, "query": "..." }
The script returns
{ "prompt": "...", "max_tokens": M }
If the hook is not set, Roo Code behaves exactly as now
(full backward compatibility).

Benefits

Users can locally—and at zero cost to themselves—insert
– a rule-based parser,
– a small Ollama model,
– any RAG pipeline, all without sending extra tokens to the provider.
Experiments happen without forking the core; better algorithms emerge
faster.
A log can show savings: “Removed N tokens before sending.”

MVP

Setting in settings.json:
"promptPreprocessor": "/path/to/script"
Simple JSON “prompt → prompt” protocol.
Log panel showing how many tokens were saved.

Happy to help discuss the format or test a prototype.
Thanks for the great work on Roo Code!

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Reduce size of default Roo prompt #1806

Uh oh!

{{title}}

Uh oh!

Replies: 5 comments 2 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Reduce size of default Roo prompt #1806

Uh oh!

diarmidmackenzie Mar 19, 2025

Analysis of default system prompt

Potential savings

Other modes

Looking forward

Proposed next steps

Replies: 5 comments · 2 replies

Uh oh!

diarmidmackenzie Mar 19, 2025 Author

Uh oh!

mrubens Mar 19, 2025 Maintainer

Uh oh!

diarmidmackenzie Mar 21, 2025 Author

Uh oh!

diarmidmackenzie Mar 27, 2025 Author

Uh oh!

diarmidmackenzie Mar 28, 2025 Author

Uh oh!

BradKML May 28, 2025

Uh oh!

OleynikAleksandr Jun 4, 2025

Pain points

Proposal 1 — Local “Prompt Analyzer”

Request to the Roo Code team — “Prompt Preprocessor” API

How it could work

Benefits

MVP

diarmidmackenzie
Mar 19, 2025

Replies: 5 comments 2 replies

diarmidmackenzie
Mar 19, 2025
Author

mrubens
Mar 19, 2025
Maintainer

diarmidmackenzie Mar 21, 2025
Author

diarmidmackenzie
Mar 27, 2025
Author

diarmidmackenzie
Mar 28, 2025
Author

OleynikAleksandr
Jun 4, 2025