How to configure models specifically for RAG (localdocs)? #2229

Seedmanc · 2024-04-16T22:09:29Z

Seedmanc
Apr 16, 2024

Meaning the Generation settings area for the model, such as the context size and so on. Can retrieval proficiency be improved by tuning those?

Here are some assumptions, I would expect that the larger the document split size is, the better would be the understanding of the excerpts by the model, but the size X chunks has to fit the context window, therefore setting it as high as the model allows should benefit the retrieval goals, right?

On the other hand, in my experience the chat memory of previous messages often does more harm than good for RAG, and that also depends on the context size. E.g. I've made a list of questions to the models about the topic of my document to quicky access their proficiency, but asking them all in succession confuses the bot, since they're about different parts of the original document and seem random and unrelated to each other. Some models even directly ask me whether I want to continue dicussing previous question or switch topic.
Is there a way to reduce or disable chat memory for the purposes of RAG?

Thirdly, I think temperature should be kept to 0 to make sure the LLM only uses the provided context to answer questions instead of its imagination. However it's hard to notice the effect of this setting as some model continue to hallucinate even at 0 while others fail to come up with anything at high setting.

Any other settings that would affect RAGging I should know about?

Seedmanc · 2025-04-17T14:26:25Z

Seedmanc
Apr 17, 2025
Author

here's a year

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to configure models specifically for RAG (localdocs)? #2229

{{title}}

Replies: 1 comment

{{title}}

Select a reply

How to configure models specifically for RAG (localdocs)? #2229

Seedmanc Apr 16, 2024

Replies: 1 comment

Seedmanc Apr 17, 2025 Author

Seedmanc
Apr 16, 2024

Seedmanc
Apr 17, 2025
Author