-
Notifications
You must be signed in to change notification settings - Fork 16
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Optimize the bitset sampling algorithm.
The existing bitset sampling implementation works by using a binomial distribution to decide how many bits to keep, randomly chooses the indices of those bits, sorts the vector of indices and finally iterates over all the bits one by one to clear those not contained in the vector. This can be very inefficient, in particular when sampling over large bit sets with a very low sampling rate. In that case, the list of indices to keep is roughly as large as the bitset itself, and sorting it requires O(nlog(n)) time, which ends up being significant. Additionally, walking over every single bit, set or not, to be cleared or not, is pretty inefficient as well. This commit optimizes the implementation through a few methods: - Instead of sampling and sorting indices to keep, it randomly samples the size of the gaps between two succesful bernouilli trials. This was inspired by the [FastBernoulliTrial] class. - When the sampling rate is higher than 1/2, it flips the sampling logic and uses the gaps between two unsuccessful trials, minimizing the number of loop iterations. - Finally, in order to take full advantage of the gap lengths, it is able to quickly scan through the bitset to skip a given number of set bits, calling popcnt once per words rather than looking at each bit, and can clear entire ranges of bits at once by overwriting entire words, rather than masking bits one by one. This implementation is faster than the previous one for the entire parameter space. The difference is most drastic for very low sampling rates where the new implementation is more than two orders of magnitude faster. [FastBernoulliTrial]: https://searchfox.org/mozilla-central/rev/a6d25de0c706dbc072407ed5d339aaed1cab43b7/mfbt/FastBernoulliTrial.h
- Loading branch information
Showing
2 changed files
with
209 additions
and
41 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters