Skip to content

What are segments? What is their purpose? #957

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
Ddystopia opened this issue Mar 19, 2025 · 1 comment
Open

What are segments? What is their purpose? #957

Ddystopia opened this issue Mar 19, 2025 · 1 comment

Comments

@Ddystopia
Copy link
Contributor

I read the docs and source code, but can't really understand what are those segments, why are they stored for so long. You probably discussed it internally, but the repo just does not contain any info except "Segments are used to store the published messages. ".

We tried it on a +- 250MB machine, and were shocked of 10MB/hour "leaks". After some hours of debugging rumqttd (valgrind cannot understand what is happening to reference counted allocation), we found out that segments were configured improperly.

Questions:

  1. What is the trade off between segment count and segment length? How should users choose between one big segment and a lot of smaller ones?
  2. Why segments always store messages, until saturated? Code of readv is:
            let o = self.data[idx as usize..limit as usize]
                .iter()
                .cloned()
                .zip(offsets);
            out.extend(o);

So in case of publishes, it is cloning (reference counted) body and topic. It is only dropped in apply_retention. If the message is not retained, why store it longer than it takes to send to all currently connected clients? Instead, it just keeps the messages until segments are saturated (that memory could've been used productively by other processes), and frees them gradually when full.

I think it is not optimal, because I noticed that rumqttd sends messages a lot later, when it has to do that retention. It does it just before appending a new message:

    #[inline]
    pub fn append(&mut self, message: T) -> (u64, u64) {
        self.apply_retention();

But if it did it before it arrived, clients wouldn't have to wait for apply_retention to end.

@Ddystopia
Copy link
Contributor Author

After another session of valgrind I think I got it. max_segment_size and max_segment_count is used for every "filter" separately, so you can't really limit memory usage without specifying all possible filters upfront and summing up among them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant