Parallel PutChunk RPC calls in WriteMV #1806

souravgupta-msft · 2025-05-30T17:15:36Z

Type of Change

Bug fix
New feature
Code quality improvement
Other (describe):

Description

Send PutChunk RPC calls in parallel in WriteMV to improve performance.

linuxsmiths · 2025-05-31T11:01:11Z

internal/dcache/replication_manager/replication_manager.go

+		} else if slices.Contains(rvsWritten, rv.Name) {
+			log.Debug("ReplicationManager::WriteMV: Skipping RV %s/%s (state %s) as it was already written to: %v",
+				rv.Name, req.MvName, rv.State, rvsWritten)
+
+			common.Assert(retryCnt > 0)
+			common.Assert(rv.State == string(dcache.StateOnline) ||
+				rv.State == string(dcache.StateSyncing), rv.Name, rv.State, rvsWritten)
+
+			//
+			// Skip writing to this RV, as it is previously written to.
+			// So, send nil response to the response channel to indicate that
+			// we are not writing to this RV.
+			//
+			responseChannel <- nil
+


it's not safe to omit replicas that were written in some prev attempt.
Imagine a replica goes from online->offline->online state during the course of one write.
When it was online the first time around we wrote a chunk, then it went offline and again joined back. This would have caused the chunk that was written, to be cleared. Now this replica doesn't have this chunk and we won't write it since as per rvsWritten we have already written that chunk.
We should consider all the replica writes corresponding to one mv write as a transaction, performed in a given cluster state (the entire write transaction must complete in one cluster state). Only when we complete a write transaction in one cluster state, we are guaranteed that if the cluster changes to some other state (one or more component RVs of this MV going offline and/or replaced by other RVs) the data will be properly sync'ed.
If all the replicas agree with the cluster state and they confirm that by not failing any replica write with NeedToRefreshClusterMap, then we can consider that write transaction as successful.
Taking some replica writes from previous cluster state and some from later ones, the entire mv write transaction cannot be considered successful.

Anyways, it's ok to repeat some component writes in the rare case of an MV changing in the middle of a write.

Make sure RefreshClusterMap refreshes to the desired clustermap epoch.

souravgupta-msft added 5 commits May 30, 2025 16:52

adding threadpool for rpc requests

3185589

sending parallel PutChunk calls

d441c7b

change

e4b5d07

sync with feature branch

f71e0d3

changes

6857f47

souravgupta-msft added this to the v2-2.6.0 milestone May 30, 2025

souravgupta-msft requested a review from linuxsmiths May 30, 2025 17:15

souravgupta-msft added the V2 label May 30, 2025

souravgupta-msft requested review from vibhansa-msft and ashruti-msft as code owners May 30, 2025 17:15

souravgupta-msft added the pr-ready-for-review label May 30, 2025

souravgupta-msft requested review from syeleti-msft and jainakanksha-msft as code owners May 30, 2025 17:15

souravgupta-msft added the dcache label May 30, 2025

sync with feature branch

8177be6

linuxsmiths reviewed May 31, 2025

View reviewed changes

Tomar audit changes.

5bdbfc0

linuxsmiths force-pushed the sourav/parallelWriteMV branch from 255d6cc to 5bdbfc0 Compare May 31, 2025 13:42

Tomar audit changes

428f521

Make sure RefreshClusterMap refreshes to the desired clustermap epoch.

linuxsmiths force-pushed the sourav/parallelWriteMV branch from 910a593 to 428f521 Compare June 1, 2025 16:11

linuxsmiths approved these changes Jun 1, 2025

View reviewed changes

linuxsmiths merged commit 690a415 into feature/dcache Jun 1, 2025
1 check passed

linuxsmiths deleted the sourav/parallelWriteMV branch June 1, 2025 16:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Parallel PutChunk RPC calls in WriteMV #1806

Parallel PutChunk RPC calls in WriteMV #1806

Uh oh!

souravgupta-msft commented May 30, 2025

Uh oh!

linuxsmiths May 31, 2025

Uh oh!

Uh oh!

Uh oh!

Parallel PutChunk RPC calls in WriteMV #1806

Parallel PutChunk RPC calls in WriteMV #1806

Uh oh!

Conversation

souravgupta-msft commented May 30, 2025

Type of Change

Description

Uh oh!

linuxsmiths May 31, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!