Skip to content

The handling logic for the configuration item replication_reconnection_retries seems to have some issues #2157

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
forgottener opened this issue Mar 4, 2025 · 1 comment

Comments

@forgottener
Copy link

Version in use: v1.37.0

Scene: Synchronizing MySQL8 binlog changes to RabbitMQ, using --replication_reconnection_retries=10

Problem: When the MySQL server restarts, the maxwell log shows the following:

2025-03-04 11:13:46,967 INFO  BinlogConnectorReplicator - Binlog disconnected.
2025-03-04 11:13:46,996 WARN  BinlogConnectorReplicator - replicator stopped at position: mysql-bin.021123:300511507 -- restarting
2025-03-04 11:13:46,997 INFO  BinlogConnectorReplicator - Reconnection attempt: 1 of 10
2025-03-04 11:13:47,000 INFO  BinlogConnectorReplicator - Reconnection attempt: 2 of 10
2025-03-04 11:13:47,008 INFO  BinlogConnectorReplicator - Reconnection attempt: 3 of 10
2025-03-04 11:13:47,010 INFO  BinlogConnectorReplicator - Reconnection attempt: 4 of 10
2025-03-04 11:13:47,012 INFO  BinlogConnectorReplicator - Reconnection attempt: 5 of 10
2025-03-04 11:13:47,013 INFO  BinlogConnectorReplicator - Reconnection attempt: 6 of 10
2025-03-04 11:13:47,016 INFO  BinlogConnectorReplicator - Reconnection attempt: 7 of 10
2025-03-04 11:13:47,020 INFO  BinlogConnectorReplicator - Reconnection attempt: 8 of 10
2025-03-04 11:13:47,021 INFO  BinlogConnectorReplicator - Reconnection attempt: 9 of 10
2025-03-04 11:13:47,023 INFO  BinlogConnectorReplicator - Reconnection attempt: 10 of 10
2025-03-04 11:13:47,031 INFO  TaskManager - Stopping 3 tasks
2025-03-04 11:13:47,032 ERROR TaskManager - cause: 
java.util.concurrent.TimeoutException: Maximum reconnection attempts reached.

Based on the log time intervals, it can be seen that these 10 retries occur almost within the same second. Checking the source code on the latest master branch: com.zendesk.maxwell.replication.BinlogConnectorReplicator#tryReconnect is as follows:

private void tryReconnect() throws TimeoutException {
	int reconnectionAttempts = 0;

	while ((reconnectionAttempts += 1) <= this.replicationReconnectionRetries || this.replicationReconnectionRetries == 0) {
		try {
			LOGGER.info(String.format("Reconnection attempt: %s of %s", reconnectionAttempts, replicationReconnectionRetries > 0 ? this.replicationReconnectionRetries : "unlimited"));
			client.connect(5000);
			return;
		} catch (IOException | TimeoutException ignored) { }
	}
	throw new TimeoutException("Maximum reconnection attempts reached.");
}

The connection exceptions are swallowed immediately, which suggests that it is not waiting for 5000 milliseconds.

Question: In handling the reconnection logic here, should the MySQL BinaryLogClient connection be synchronous and blocking? Otherwise, the reconnection logic cannot be handled properly.

@osheroff
Copy link
Collaborator

Sorry for delay. Maintainer in europe. What's happening is an immediate IOException is getting thrown, 10 times in a row, and then it bails. I'd definitely take a patch that had it do some small amount of sleeping in between requests -- like it should wait maybe a minimum of 500ms between requests?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants