Starting from Java version 1.0, the BitSet class has been available, representing an array of bits that can be set or cleared. It can be employed in various scenarios where manipulation of bit-level information and flag handling is required, owing to its efficiency and compactness when dealing with substantial volumes of bit-based data. It sounds impressive, but there's a caveat: the native implementation of BitSet in Java is not thread-safe. The authors themselves caution about this in comments within the class: "A BitSet is not safe for multithreaded use without external synchronization."
I tried to implement my thread-safe BitSet and want to show what came out of it. Important: the complete implementation will be included in the new version of my moonshine project, and you'll be able to check out the implementation there. For now, let's consider only 4 methods of the thread-safe BitSet.
public interface ConcurrentBitSet {
boolean get(int bitIndex);
void set(int bitIndex);
void clear(int bitIndex);
void flip(int bitIndex);
}
Below, I will provide code examples of various ways to implement a thread-safe BitSet and their comparison using the following tests:
- thread safety correctness test, which will demonstrate that a specific implementation is free from data races.
- benchmark, which will showcase the performance of each implementation in terms of throughput.
The thread safety correctness test is written using the MultithreadedTC framework, while the benchmark is conducted using the Java Microbenchmark Harness (JMH).
To begin, let's attempt the Thread Safety Correctness Test on the native Java implementation of BitSet and verify that it's indeed not thread-safe, and the test can identify this.
public class ConcurrentBitSetMultithreadingTest extends MultithreadedTest {
private BitSet nativeBitSet;
@Override
public void initialize() {
this.nativeBitSet = new BitSet(10);
}
public void thread1() throws InterruptedException {
this.nativeBitSet.set(1);
}
public void thread2() throws InterruptedException {
this.nativeBitSet.set(2);
}
@Override
public void finish() {
assertTrue(nativeBitSet.get(1) && nativeBitSet.get(2));
}
@Test
public void set_get_test() throws Throwable {
TestFramework.runManyTimes(new ConcurrentBitSetMultithreadingTest(), 1000);
}
}
In this test, there are two threads simultaneously executing the thread1 and thread2 methods. Both of these methods set the values of bits 1 and 2, respectively. If there are no data races, both threads should be able to see the designated values and set them. In the Java BitSet implementation, bits 1 and 2 belong to the same array element. The finish method checks the values of bits 1 and 2: if both bits were set, then assertTrue won't throw an exception, otherwise, the test will fail.
Such a test on the native Java implementation of BitSet is expected to fail, indicating that the test has identified thread safety issues.
-------------------------------------------------------
T E S T S
-------------------------------------------------------
Running io.github.ololx.samples.concurrent.bitset.ConcurrentBitsetTest
Tests run: 16, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.056 sec - in io.github.ololx.samples.concurrent.bitset.ConcurrentBitsetTest
Running io.github.ololx.samples.concurrent.bitset.ConcurrentBitSetMultithreadingTest
Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 0.019 sec <<< FAILURE! - in io.github.ololx.samples.concurrent.bitset.ConcurrentBitSetMultithreadingTest
set_get_test(io.github.ololx.samples.concurrent.bitset.ConcurrentBitSetMultithreadingTest) Time elapsed: 0.018 sec <<< FAILURE!
junit.framework.AssertionFailedError: null
at junit.framework.Assert.fail(Assert.java:55)
at junit.framework.Assert.assertTrue(Assert.java:22)
at junit.framework.Assert.assertTrue(Assert.java:31)
at io.github.ololx.samples.concurrent.bitset.ConcurrentBitSetMultithreadingTest.finish(ConcurrentBitSetMultithreadingTest.java:28)
Results :
Failed tests:
ConcurrentBitSetMultithreadingTest.set_get_test:33->finish:28 null
Tests run: 17, Failures: 1, Errors: 0, Skipped: 0
The first and most apparent approach is to synchronize each access to the internal data structures — the array of bits. Effectively, it's enough to create a wrapper around the native BitSet instance, in which every invocation of methods from this instance would be synchronized.
public class ConcurrentBitSetWithFullSynchronization {
private final BitSet bitSet;
private final Object monitor;
public ConcurrentBitSetWithFullSynchronization(int size) {
this.bitSet = new BitSet(size);
this.monitor = new Object();
}
@Override
public boolean get(int bitIndex) {
synchronized (this.monitor) {
return this.bitSet.get(bitIndex);
}
}
@Override
public void set(int bitIndex) {
synchronized (this.monitor) {
this.bitSet.set(bitIndex);
}
}
@Override
public void clear(int bitIndex) {
synchronized (this.monitor) {
this.bitSet.clear(bitIndex);
}
}
@Override
public void flip(int bitIndex) {
synchronized (this.monitor) {
this.bitSet.flip(bitIndex);
}
}
}
Such an implementation of BitSet is already considered thread-safe and passes the thread safety correctness test. It's important to note that now only one thread can be in any given method at a time, even if different parts of our BitSet are being accessed. In this scenario, under heavy multi-threaded usage, high performance should not be expected. I won't conduct the benchmark results for this implementation since there's nothing to compare the results to at the moment.
The main drawback of the first implementation is the synchronization of any thread accessing the BitSet methods, which puts the threads in a sequential queue, causing them to wait for another thread to finish. In this case, it might be worth considering not blocking the threads attempting to read the BitSet value until a thread tries to modify the BitSet value. To implement such an approach, using a read-write lock could be a suitable choice.
A read-write lock distinguishes between read operations and write operations:
- Read Lock: Multiple threads can hold a read lock simultaneously as long as no threads hold a write lock. This allows for concurrent read operations without blocking each other, which is useful when the data is being read more frequently than being written.
- Write Lock: Only one thread can hold a write lock at a time, and during this time, no other thread can hold either a read or write lock. This ensures that write operations are exclusive and don't interfere with each other or ongoing read operations.
The java.util.concurrent package in Java provides the ReentrantReadWriteLock class, which implements a read-write lock. This kind of lock can help achieve better concurrency in scenarios where there's a mix of read-heavy and write-heavy operations.
To implement a thread-safe BitSet using Java's ReentrantReadWriteLock, it's enough to create a wrapper around the native BitSet instance. In this wrapper, also need to initialize an instance of ReadWriteLock, and every call to the BitSet methods would be surrounded by the appropriate lock:
- A write-lock for methods that modify the BitSet's value.
- A read-lock for methods that read the BitSet's value.
public class ConcurrentBitSetWithGeneralRWLock {
private final BitSet bitSet;
private final ReadWriteLock readWriteLock;
public ConcurrentBitSetWithGeneralRWLock(int size) {
this.bitSet = new BitSet(size);
this.readWriteLock = new ReentrantReadWriteLock();
}
@Override
public boolean get(int bitIndex) {
return this.lockAndGet(bitIndex, () -> this.bitSet.get(bitIndex));
}
@Override
public void set(int bitIndex) {
this.lockAndSet(bitIndex, this.bitSet::set);
}
@Override
public void clear(int bitIndex) {
this.lockAndSet(bitIndex, this.bitSet::clear);
}
@Override
public void flip(int bitIndex) {
this.lockAndSet(bitIndex, this.bitSet::flip);
}
private Boolean lockAndGet(int bitIndex, Supplier<Boolean> getBitSupplier) {
this.readWriteLock.readLock()
.lock();
try {
return getBitSupplier.get();
} finally {
this.readWriteLock.readLock()
.unlock();
}
}
private void lockAndSet(int bitIndex, Consumer<Integer> setBitConsumer) {
this.readWriteLock.writeLock()
.lock();
try {
setBitConsumer.accept(bitIndex);
} finally {
this.readWriteLock.writeLock()
.unlock();
}
}
}
Such an implementation, in theory, should be more performant than the previous one, as not all threads would be synchronized at all times. Let's test this assumption in practice using a benchmark.
@State(Scope.Thread)
@Warmup(
iterations = 5,
time = 100,
timeUnit = TimeUnit.MILLISECONDS
)
@Measurement(
iterations = 5,
time = 100,
timeUnit = TimeUnit.MILLISECONDS
)
@Fork(3)
@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.MICROSECONDS)
public class ConcurrentBitSetBenchmark {
private static final int AVAILABLE_CPU = Runtime.getRuntime()
.availableProcessors();
private static final String FULL_SYNCHRONIZATION = "javaNativeWithSynchronizationByThis";
private static final String ONE_READ_WRITE_LOCK = "javaNativeWithOneReadWriteLock";
private ConcurrentBitSet concurrentBitSet;
@Param(
{
FULL_SYNCHRONIZATION,
ONE_READ_WRITE_LOCK
}
)
private String typeOfBitSetRealization;
private ExecutorService executor;
@Param({"10", "100", "1000"})
private int sizeOfBitSet;
@Param({"1", "10"})
private int countOfSetters;
@Param({"1", "10"})
private int countOfGetters;
public ConcurrentBitSetBenchmark() {}
public static void main(String[] args) throws RunnerException {
var options = new OptionsBuilder()
.include(ConcurrentBitSetBenchmark.class.getSimpleName())
.build();
new Runner(options).run();
}
@Setup
public void setup() {
switch (typeOfBitSetRealization) {
case FULL_SYNCHRONIZATION -> concurrentBitSet = new ConcurrentBitSetWithFullSynchronization(sizeOfBitSet);
case ONE_READ_WRITE_LOCK -> concurrentBitSet = new ConcurrentBitSetWithGeneralRWLock(sizeOfBitSet);
}
executor = Executors.newWorkStealingPool(AVAILABLE_CPU);
}
@TearDown
public void tearDown() {
executor.shutdown();
}
@Benchmark
public void get_set_benchmark(Blackhole blackhole) {
var tasksCount = sizeOfBitSet * (countOfGetters + countOfSetters);
var bitsetInvocations = new ArrayList<CompletableFuture<Void>>(tasksCount);
for (int setBitOpNumber = 0; setBitOpNumber < countOfGetters; setBitOpNumber++) {
bitsetInvocations.add(CompletableFuture.runAsync(() -> {
for (int bitNumber = 0; bitNumber < sizeOfBitSet; bitNumber++) {
blackhole.consume(concurrentBitSet.get(bitNumber));
}
}, executor));
}
for (int getBitOpNumber = 0; getBitOpNumber < countOfSetters; getBitOpNumber++) {
bitsetInvocations.add(CompletableFuture.runAsync(() -> {
for (int bitNumber = 0; bitNumber < sizeOfBitSet; bitNumber++) {
concurrentBitSet.set(bitNumber);
}
}, executor));
}
CompletableFuture.allOf(bitsetInvocations.toArray(CompletableFuture[]::new))
.join();
}
}
This test is executed with different combinations of parameters:
- typeOfBitSetRealization - the type of thread-safe BitSet implementation:
- javaNativeWithSynchronizationByThis - concurrent BitSet using synchronization;
- javaNativeWithOneReadWriteLock - concurrent BitSet using read-write lock;
- countOfGetters - the number of accesses for reading a specific bit from the BitSet (number of threads for reading);
- countOfSetters - the number of accesses for writing a specific bit to the BitSet (number of threads for writing);
- sizeOfBitSet - the number of bits in the BitSet, also determines the number of bits read/written by each thread.
Benchmark (countOfGetters) (countOfSetters) (sizeOfBitSet) (typeOfBitSetRealization) Mode Cnt Score Error Units
ConcurrentBitSetBenchmark.get_set_benchmark 1 1 64 javaNativeWithSynchronizationByThis avgt 15 7,599 ± 0,249 us/op
ConcurrentBitSetBenchmark.get_set_benchmark 1 1 64 javaNativeWithOneReadWriteLock avgt 15 16,029 ± 12,473 us/op
ConcurrentBitSetBenchmark.get_set_benchmark 1 1 640 javaNativeWithSynchronizationByThis avgt 15 26,445 ± 1,427 us/op
ConcurrentBitSetBenchmark.get_set_benchmark 1 1 640 javaNativeWithOneReadWriteLock avgt 15 33,103 ± 1,770 us/op
ConcurrentBitSetBenchmark.get_set_benchmark 1 5 64 javaNativeWithSynchronizationByThis avgt 15 13,650 ± 1,473 us/op
ConcurrentBitSetBenchmark.get_set_benchmark 1 5 64 javaNativeWithOneReadWriteLock avgt 15 15,289 ± 1,517 us/op
ConcurrentBitSetBenchmark.get_set_benchmark 1 5 640 javaNativeWithSynchronizationByThis avgt 15 171,227 ± 49,553 us/op
ConcurrentBitSetBenchmark.get_set_benchmark 1 5 640 javaNativeWithOneReadWriteLock avgt 15 101,555 ± 6,714 us/op
ConcurrentBitSetBenchmark.get_set_benchmark 5 1 64 javaNativeWithSynchronizationByThis avgt 15 12,510 ± 0,704 us/op
ConcurrentBitSetBenchmark.get_set_benchmark 5 1 64 javaNativeWithOneReadWriteLock avgt 15 16,947 ± 3,171 us/op
ConcurrentBitSetBenchmark.get_set_benchmark 5 1 640 javaNativeWithSynchronizationByThis avgt 15 157,836 ± 34,753 us/op
ConcurrentBitSetBenchmark.get_set_benchmark 5 1 640 javaNativeWithOneReadWriteLock avgt 15 155,556 ± 13,568 us/op
ConcurrentBitSetBenchmark.get_set_benchmark 5 5 64 javaNativeWithSynchronizationByThis avgt 15 21,713 ± 1,146 us/op
ConcurrentBitSetBenchmark.get_set_benchmark 5 5 64 javaNativeWithOneReadWriteLock avgt 15 23,875 ± 1,942 us/op
ConcurrentBitSetBenchmark.get_set_benchmark 5 5 640 javaNativeWithSynchronizationByThis avgt 15 550,413 ± 31,637 us/op
ConcurrentBitSetBenchmark.get_set_benchmark 5 5 640 javaNativeWithOneReadWriteLock avgt 15 220,764 ± 4,870 us/op
As a result, we can observe that the implementation with a read-write lock performs better compared to the synchronized approach when there is high contention and more frequent write accesses.
In the previous enhancement, is attempted to restrict a thread's access to reading from the BitSet only if another thread was attempting to write a value to the BitSet at that moment. Therefore, the new enhancement is based on the following idea - instead of locking access to the entire BitSet, it's better to synchronize access only when threads are accessing the same index in the array of bits. This approach is known as fine-grained locking. It's worth noting that in a BitSet, all bits are stored in an array (in the native Java BitSet, this is a long[]), where each array element represents a machine word. And two threads writing to adjacent bytes shouldn't interfere with each other - word tearing is forbidden in the JVM.
For the 'words' array, it's necessary to correspond array of locks. At the start, the size of the lock array should be less than or equal to the internal size of the 'words' array - this is important to prevent two locks on a single word. The size of the lock array can be made a constructor parameter and configured more finely, but for simplicity, we'll create a lock array with the same dimension as the 'words' array.
public class ConcurrentBitSetWithSegmentsRWLocks {
private static final int ADDRESS_BITS_PER_WORD = 6;
private final BitSet bitSet;
private final ReadWriteLock[] readWriteLocks;
public ConcurrentBitSetWithSegmentsRWLocks(int size) {
this.bitSet = new BitSet(size);
this.readWriteLocks = new ReadWriteLock[wordIndex(size - 1) + 1];
for (int index = 0; index < this.readWriteLocks.length; index++) {
this.readWriteLocks[index] = new ReentrantReadWriteLock();
}
}
@Override
public boolean get(int bitIndex) {
return this.lockAndGet(bitIndex, () -> this.bitSet.get(bitIndex));
}
@Override
public void set(int bitIndex) {
this.lockAndSet(bitIndex, this.bitSet::set);
}
@Override
public void clear(int bitIndex) {
this.lockAndSet(bitIndex, this.bitSet::clear);
}
@Override
public void flip(int bitIndex) {
this.lockAndSet(bitIndex, this.bitSet::flip);
}
private Boolean lockAndGet(int bitIndex, Supplier<Boolean> getBitSupplier) {
this.readWriteLocks[wordIndex(bitIndex)].readLock()
.lock();
try {
return getBitSupplier.get();
} finally {
this.readWriteLocks[wordIndex(bitIndex)].readLock()
.unlock();
}
}
private void lockAndSet(int bitIndex, Consumer<Integer> setBitConsumer) {
this.readWriteLocks[wordIndex(bitIndex)].writeLock()
.lock();
try {
setBitConsumer.accept(bitIndex);
} finally {
this.readWriteLocks[wordIndex(bitIndex)].writeLock()
.unlock();
}
}
private int wordIndex(int bitIndex) {
return bitIndex >> ADDRESS_BITS_PER_WORD;
}
}
Such an implementation passes the thread safety correctness test and in theory, it should be more performant than the previous implementations, as evidenced by the benchmark results. In the new benchmark, the parameter typeOfBitSetRealization is added:
- javaNativeWithSynchronizationByThis - concurrent BitSet using synchronization;
- javaNativeWithOneReadWriteLock - concurrent BitSet using read-write lock;
- javaNativeWithManyReadWriteLocksBySegments - concurrent BitSet using fine-grained locking.
Benchmark (countOfGetters) (countOfSetters) (sizeOfBitSet) (typeOfBitSetRealization) Mode Cnt Score Error Units
ConcurrentBitSetBenchmark.get_set_benchmark 1 1 64 javaNativeWithSynchronizationByThis avgt 15 8,049 ± 1,229 us/op
ConcurrentBitSetBenchmark.get_set_benchmark 1 1 64 javaNativeWithOneReadWriteLock avgt 15 8,357 ± 0,529 us/op
ConcurrentBitSetBenchmark.get_set_benchmark 1 1 64 javaNativeWithManyReadWriteLocksBySegments avgt 15 8,260 ± 0,337 us/op
ConcurrentBitSetBenchmark.get_set_benchmark 1 1 640 javaNativeWithSynchronizationByThis avgt 15 25,674 ± 1,121 us/op
ConcurrentBitSetBenchmark.get_set_benchmark 1 1 640 javaNativeWithOneReadWriteLock avgt 15 31,116 ± 0,803 us/op
ConcurrentBitSetBenchmark.get_set_benchmark 1 1 640 javaNativeWithManyReadWriteLocksBySegments avgt 15 25,550 ± 0,490 us/op
ConcurrentBitSetBenchmark.get_set_benchmark 1 5 64 javaNativeWithSynchronizationByThis avgt 15 12,312 ± 0,684 us/op
ConcurrentBitSetBenchmark.get_set_benchmark 1 5 64 javaNativeWithOneReadWriteLock avgt 15 13,336 ± 0,415 us/op
ConcurrentBitSetBenchmark.get_set_benchmark 1 5 64 javaNativeWithManyReadWriteLocksBySegments avgt 15 13,865 ± 0,943 us/op
ConcurrentBitSetBenchmark.get_set_benchmark 1 5 640 javaNativeWithSynchronizationByThis avgt 15 149,213 ± 37,374 us/op
ConcurrentBitSetBenchmark.get_set_benchmark 1 5 640 javaNativeWithOneReadWriteLock avgt 15 96,545 ± 1,655 us/op
ConcurrentBitSetBenchmark.get_set_benchmark 1 5 640 javaNativeWithManyReadWriteLocksBySegments avgt 15 82,410 ± 5,834 us/op
ConcurrentBitSetBenchmark.get_set_benchmark 5 1 64 javaNativeWithSynchronizationByThis avgt 15 12,124 ± 0,486 us/op
ConcurrentBitSetBenchmark.get_set_benchmark 5 1 64 javaNativeWithOneReadWriteLock avgt 15 15,382 ± 0,639 us/op
ConcurrentBitSetBenchmark.get_set_benchmark 5 1 64 javaNativeWithManyReadWriteLocksBySegments avgt 15 15,305 ± 0,345 us/op
ConcurrentBitSetBenchmark.get_set_benchmark 5 1 640 javaNativeWithSynchronizationByThis avgt 15 134,842 ± 22,531 us/op
ConcurrentBitSetBenchmark.get_set_benchmark 5 1 640 javaNativeWithOneReadWriteLock avgt 15 154,292 ± 5,257 us/op
ConcurrentBitSetBenchmark.get_set_benchmark 5 1 640 javaNativeWithManyReadWriteLocksBySegments avgt 15 58,527 ± 1,225 us/op
ConcurrentBitSetBenchmark.get_set_benchmark 5 5 64 javaNativeWithSynchronizationByThis avgt 15 20,826 ± 1,509 us/op
ConcurrentBitSetBenchmark.get_set_benchmark 5 5 64 javaNativeWithOneReadWriteLock avgt 15 22,992 ± 1,885 us/op
ConcurrentBitSetBenchmark.get_set_benchmark 5 5 64 javaNativeWithManyReadWriteLocksBySegments avgt 15 22,446 ± 0,687 us/op
ConcurrentBitSetBenchmark.get_set_benchmark 5 5 640 javaNativeWithSynchronizationByThis avgt 15 507,174 ± 17,105 us/op
ConcurrentBitSetBenchmark.get_set_benchmark 5 5 640 javaNativeWithOneReadWriteLock avgt 15 236,177 ± 42,689 us/op
ConcurrentBitSetBenchmark.get_set_benchmark 5 5 640 javaNativeWithManyReadWriteLocksBySegments avgt 15 148,687 ± 4,516 us/op
As a result, you can observe that fine-grained locking has proven to be more performant than the two previous approaches, and the performance advantage becomes more noticeable as the size of the BitSet elements increases.
The previous implementation turned out to be even more performant, but it still queues up threads, forcing them to wait for another thread's completion. Additionally, the extra cost of synchronizing the system context increases as the number of waiting threads grows. Therefore, the emphasis in this new implementation is on eliminating locks entirely.
The requirements for the new implementation can be formulated as follows:
- Any two threads working with different words should not be synchronized or queued (word tearing is forbidden in the JVM).
- If multiple threads access the same word for read/write, there should be no reordering of operations (happens-before guarantees between threads should be provided).
- Multiple readers for the same word index should not be blocked unless there is a writer among them.
When looking at these requirements, (1) is already inherent in the JVM and was utilized in the previous implementation, and for (2)-(3), we need to make our 'words' array volatile. Unfortunately, we can't simply declare our 'words' array as volatile and be done with it: even if an array is declared as volatile, it doesn't provide volatile semantics for reading or writing its elements. For simultaneous access to the k-th element of the array, external synchronization is required. Volatile applies only to the array reference itself. However, we can implement our custom array with atomic features (similar to atomic data types), which would fulfill both (2) and (3). This custom array can provide us with volatile read operations and non-blocking insertion of new values through CAS operations.
class AtomicWordArray {
private static final VarHandle WORDS_VAR_HANDLER = MethodHandles.arrayElementVarHandle(byte[].class);
private final byte[] words;
private AtomicWordArray(int size) {
this.words = new byte[size];
}
private void setWordVolatile(int wordIndex, Function<Byte, Byte> binaryOperation) {
byte currentWordValue;
byte newWordValue;
do {
currentWordValue = this.getWordVolatile(wordIndex);
newWordValue = binaryOperation.apply(currentWordValue);
} while (!WORDS_VAR_HANDLER.compareAndSet(this.words, wordIndex, currentWordValue, newWordValue));
}
private byte getWordVolatile(int wordIndex) {
return (byte) WORDS_VAR_HANDLER.getVolatile(this.words, wordIndex);
}
}
I intentionally use a byte[] as the 'words' array, as this allows us to use 1 byte when creating an 8-bit BitSet instead of 64 bits, and simplifies the logic of working with bits. However, nothing prevents us from implementing a similar Atomic array over long[]. Now, it's enough to apply the AtomicWordArray in the lock-free BitSet implementation and write the implementation of the ConcurrentBitSet methods.
public class NonBlockingConcurrentBitSet {
private static final int WORD_SIZE = Byte.SIZE;
private final AtomicWordArray data;
public NonBlockingConcurrentBitset(int size) {
this.data = new AtomicWordArray((size + WORD_SIZE - 1) / WORD_SIZE);
}
@Override
public boolean get(int bitIndex) {
int wordIndex = bitIndex / WORD_SIZE;
int bitOffset = bitIndex % WORD_SIZE;
int bitMask = 1 << bitOffset;
return (this.data.getWordVolatile(wordIndex) & bitMask) >> bitOffset != 0;
}
@Override
public void set(int bitIndex) {
int wordIndex = bitIndex / WORD_SIZE;
int bitOffset = bitIndex % WORD_SIZE;
int bitMask = 1 << bitOffset;
this.data.setWordVolatile(wordIndex, word -> (byte) (word | bitMask));
}
@Override
public void clear(int bitIndex) {
int wordIndex = bitIndex / WORD_SIZE;
int bitOffset = bitIndex % WORD_SIZE;
int bitMask = 1 << bitOffset;
this.data.setWordVolatile(wordIndex, word -> (byte) (word & ~bitMask));
}
@Override
public void flip(int bitIndex) {
int wordIndex = bitIndex / WORD_SIZE;
int bitOffset = bitIndex % WORD_SIZE;
int bitMask = 1 << bitOffset;
this.data.setWordVolatile(wordIndex, word -> (byte) (word ^ bitMask));
}
private static class AtomicWordArray {
private static final VarHandle WORDS_VAR_HANDLER = MethodHandles.arrayElementVarHandle(byte[].class);
private final byte[] words;
private AtomicWordArray(int size) {
this.words = new byte[size];
}
private void setWordVolatile(int wordIndex, Function<Byte, Byte> binaryOperation) {
byte currentWordValue;
byte newWordValue;
do {
currentWordValue = this.getWordVolatile(wordIndex);
newWordValue = binaryOperation.apply(currentWordValue);
} while (!WORDS_VAR_HANDLER.compareAndSet(this.words, wordIndex, currentWordValue, newWordValue));
}
private byte getWordVolatile(int wordIndex) {
return (byte) WORDS_VAR_HANDLER.getVolatile(this.words, wordIndex);
}
}
}
This implementation passes the thread safety correctness test. In the new benchmark, the parameter typeOfBitSetRealization is added:
- javaNativeWithSynchronizationByThis - concurrent BitSet using synchronization;
- javaNativeWithOneReadWriteLock - concurrent BitSet using read-write lock;
- javaNativeWithManyReadWriteLocksBySegments - concurrent BitSet using fine-grained locking;
- NonBlockingConcurrentBitset - concurrent BitSet using lock free.
Benchmark (countOfGetters) (countOfSetters) (sizeOfBitSet) (typeOfBitSetRealization) Mode Cnt Score Error Units
ConcurrentBitSetBenchmark.get_set_benchmark 1 1 64 javaNativeWithSynchronizationByThis avgt 15 7,560 ± 0,346 us/op
ConcurrentBitSetBenchmark.get_set_benchmark 1 1 64 javaNativeWithOneReadWriteLock avgt 15 8,263 ± 0,256 us/op
ConcurrentBitSetBenchmark.get_set_benchmark 1 1 64 javaNativeWithManyReadWriteLocksBySegments avgt 15 8,368 ± 0,479 us/op
ConcurrentBitSetBenchmark.get_set_benchmark 1 1 64 NonBlockingConcurrentBitset avgt 15 6,820 ± 0,316 us/op
ConcurrentBitSetBenchmark.get_set_benchmark 1 1 640 javaNativeWithSynchronizationByThis avgt 15 26,041 ± 1,395 us/op
ConcurrentBitSetBenchmark.get_set_benchmark 1 1 640 javaNativeWithOneReadWriteLock avgt 15 32,026 ± 0,840 us/op
ConcurrentBitSetBenchmark.get_set_benchmark 1 1 640 javaNativeWithManyReadWriteLocksBySegments avgt 15 25,453 ± 0,910 us/op
ConcurrentBitSetBenchmark.get_set_benchmark 1 1 640 NonBlockingConcurrentBitset avgt 15 12,267 ± 0,258 us/op
ConcurrentBitSetBenchmark.get_set_benchmark 1 5 64 javaNativeWithSynchronizationByThis avgt 15 11,902 ± 0,655 us/op
ConcurrentBitSetBenchmark.get_set_benchmark 1 5 64 javaNativeWithOneReadWriteLock avgt 15 13,446 ± 0,759 us/op
ConcurrentBitSetBenchmark.get_set_benchmark 1 5 64 javaNativeWithManyReadWriteLocksBySegments avgt 15 13,747 ± 0,906 us/op
ConcurrentBitSetBenchmark.get_set_benchmark 1 5 64 NonBlockingConcurrentBitset avgt 15 9,903 ± 0,560 us/op
ConcurrentBitSetBenchmark.get_set_benchmark 1 5 640 javaNativeWithSynchronizationByThis avgt 15 153,956 ± 34,896 us/op
ConcurrentBitSetBenchmark.get_set_benchmark 1 5 640 javaNativeWithOneReadWriteLock avgt 15 95,176 ± 3,605 us/op
ConcurrentBitSetBenchmark.get_set_benchmark 1 5 640 javaNativeWithManyReadWriteLocksBySegments avgt 15 83,124 ± 2,602 us/op
ConcurrentBitSetBenchmark.get_set_benchmark 1 5 640 NonBlockingConcurrentBitset avgt 15 44,346 ± 2,026 us/op
ConcurrentBitSetBenchmark.get_set_benchmark 5 1 64 javaNativeWithSynchronizationByThis avgt 15 12,377 ± 0,795 us/op
ConcurrentBitSetBenchmark.get_set_benchmark 5 1 64 javaNativeWithOneReadWriteLock avgt 15 15,112 ± 0,397 us/op
ConcurrentBitSetBenchmark.get_set_benchmark 5 1 64 javaNativeWithManyReadWriteLocksBySegments avgt 15 16,468 ± 1,364 us/op
ConcurrentBitSetBenchmark.get_set_benchmark 5 1 64 NonBlockingConcurrentBitset avgt 15 7,733 ± 0,290 us/op
ConcurrentBitSetBenchmark.get_set_benchmark 5 1 640 javaNativeWithSynchronizationByThis avgt 15 168,127 ± 34,236 us/op
ConcurrentBitSetBenchmark.get_set_benchmark 5 1 640 javaNativeWithOneReadWriteLock avgt 15 151,939 ± 4,203 us/op
ConcurrentBitSetBenchmark.get_set_benchmark 5 1 640 javaNativeWithManyReadWriteLocksBySegments avgt 15 59,806 ± 3,437 us/op
ConcurrentBitSetBenchmark.get_set_benchmark 5 1 640 NonBlockingConcurrentBitset avgt 15 16,633 ± 0,574 us/op
ConcurrentBitSetBenchmark.get_set_benchmark 5 5 64 javaNativeWithSynchronizationByThis avgt 15 19,994 ± 0,483 us/op
ConcurrentBitSetBenchmark.get_set_benchmark 5 5 64 javaNativeWithOneReadWriteLock avgt 15 22,420 ± 0,471 us/op
ConcurrentBitSetBenchmark.get_set_benchmark 5 5 64 javaNativeWithManyReadWriteLocksBySegments avgt 15 22,747 ± 1,589 us/op
ConcurrentBitSetBenchmark.get_set_benchmark 5 5 64 NonBlockingConcurrentBitset avgt 15 11,279 ± 0,949 us/op
ConcurrentBitSetBenchmark.get_set_benchmark 5 5 640 javaNativeWithSynchronizationByThis avgt 15 507,401 ± 18,371 us/op
ConcurrentBitSetBenchmark.get_set_benchmark 5 5 640 javaNativeWithOneReadWriteLock avgt 15 218,627 ± 4,848 us/op
ConcurrentBitSetBenchmark.get_set_benchmark 5 5 640 javaNativeWithManyReadWriteLocksBySegments avgt 15 137,042 ± 10,853 us/op
ConcurrentBitSetBenchmark.get_set_benchmark 5 5 640 NonBlockingConcurrentBitset avgt 15 48,743 ± 2,908 us/op
As a result, it's evident that the current implementation has outperformed all the previous ones in terms of performance, all while maintaining the absence of compromises present in the 2nd and 3rd implementations. It's also worth noting that with low or no contention (1 setter, 1 getter, and 64 bits), all implementations exhibit similar performance.