Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[fix][broker] fix delay queue sequence issue. #24035

Merged
merged 1 commit into from
Mar 3, 2025

Conversation

thetumbled
Copy link
Member

@thetumbled thetumbled commented Feb 28, 2025

Motivation

When a group of delay messages reach to theire timestamp, we expect that the dispatch sequence in accordance with the message id.
Yet there is corner case that will break this rule, we can reproduce the problem with the unit test testDelaySequence in this pr.
The root reason is that the inner map of delayedMessageMap is Long2ObjectMap<Roaring64Bitmap> instead of Long2ObjectSortedMap<Roaring64Bitmap>.

Modifications

Change the type of delayedMessageMap to Long2ObjectSortedMap<Long2ObjectSortedMap<Roaring64Bitmap>>.

Verifying this change

  • Make sure that the change passes the CI checks.

(Please pick either of the following options)

This change added tests and can be verified as follows:

(example:)

  • Added integration tests for end-to-end deployment with large payloads (10MB)
  • Extended integration test for recovery after broker failure

Does this pull request potentially affect one of the following parts:

If the box was checked, please highlight the changes

  • Dependencies (add or upgrade a dependency)
  • The public API
  • The schema
  • The default values of configurations
  • The threading model
  • The binary protocol
  • The REST endpoints
  • The admin CLI options
  • The metrics
  • Anything that affects deployment

Documentation

  • doc
  • doc-required
  • doc-not-needed
  • doc-complete

Matching PR in forked repository

PR in forked repository: thetumbled#72

Copy link
Member

@lhotari lhotari left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@lhotari
Copy link
Member

lhotari commented Feb 28, 2025

@thetumbled Is it #23611 where the regression was introduced? Does the previous implementation have a similar issue?

@thetumbled
Copy link
Member Author

@thetumbled Is it #23611 where the regression was introduced? Does the previous implementation have a similar issue?

The previous implementation has sequence issue too, but triggered by other way. Previouse implementation sort the triple tuple (timestamp, ledgerid, entryid) with heap sort algorithm, which is not a stable sort method.
We may meet issue like this:

tracker.addMessage(0, 0, 1)
tracker.addMessage(1, 1,  1)
tracker.addMessage(2, 2, 1)

These three messages are scheduled to be delivered at the same time, but the dispatch sequence may not be 0, 1, 2 due to the sort algorithm.

@thetumbled thetumbled merged commit 998bb51 into apache:master Mar 3, 2025
67 of 69 checks passed
lhotari pushed a commit that referenced this pull request Mar 5, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants