Why CommitProxy allows RecoveryState checking to be bypassed in commitProxyServerCore?

Anthony_Lee · April 16, 2025, 12:43am

When reading code to understand better the internal recovery, this line of code makes me confused:
if (trs.size() ||
(commitData.db->get().recoveryState >= RecoveryState::ACCEPTING_COMMITS &&
masterLifetime.isEqual(commitData.db->get().masterLifetime) && lastCommitComplete.isReady())) {…}
why the condition (commitData.db->get().recoveryState >= RecoveryState::ACCEPTING_COMMIT) is allowed to be bypassed?

Anthony_Lee · April 17, 2025, 8:46pm

After checking older version code which handles any type(GRV, Commit) of transaction requests in MasterProxyServer, I think the RecoveryState checking for client transaction is in function transactionStarter,
in current version this function is in GrvProxyServer:
ASSERT(db->get().recoveryState >=
RecoveryState::ACCEPTING_COMMITS);

and there is a wait in function grvProxyServerCore before it calls transactionStarter :
while (!(masterLifetime.isEqual(grvProxyData.db->get().masterLifetime) &&
grvProxyData.db->get().recoveryState >= RecoveryState::ACCEPTING_COMMITS)) {
wait(grvProxyData.db->onChange());
}

In CommitProxyServer’s commitProxyServerCore, the purpose of the recovery state checking in my original question is not to see if it is ok to accept client transaction but to see if empty transaction is allowed, so that makes sense. Also, ClusterController’s clusterRecoveryCore send a CommitTransactionRequest to proxy[0] and TxnStateRequests to proxies, so CommitProxy need to handle them in order for RecoveryState in CC can move forward to ACCEPTING_COMMIT. But that specific CommitTransactionRequest is not from client. This aligns with the document.

Anthony_Lee · April 18, 2025, 2:28am

There is one more issue I couldn’t find answer, it seems that GrvProxyServer would reply to readVersion request without checking current RecoveryState when a GrvProxy stay in the proxy list of recovered cluster.

When a GrvProxyServer is recruited the first time there is no such issue because when it runs function grvProxyServerCore it encounters the blocking wait, after recovery state allow it pass this wait it starts actor transactionStarter to serve GetReadVersionRequest, so far so good.
In function grvProxyServer it monitors dbInfo change,
so when current proxy is removed from current dbInfo’s proxy interfaces this proxy terminates, this is good.

But when it is still in current dbInfo’s proxy list after recovery then it means the code in loop of transactionStarter will continue reply to GetReadVersionRequest no matter what the recoveryState is.

I searched the code to see if somewhere else is checking recoveryState to block client transaction during recovery but found nothing.

Could anyone share the idea about what the steps are to create a workload to test the scenario I described? Thanks.

jzhou · April 20, 2025, 7:06pm

But when it is still in current dbInfo’s proxy list after recovery then it means the code in loop of transactionStarter will continue reply to GetReadVersionRequest no matter what the recoveryState is.

After a recovery happens, in the new generation, ClusterController will recruit a new transaction system, including new GRV proxies, new commit proxies, and new tlogs. So it’s not possible for the old GRV proxy to be in the new transaction system.

Additionally, before such recruitment, old tlogs have been locked, thus preventing transactions from the previous generation from committing. I.e., even though the old GRV proxies may linger around for a short period of time, the transaction can’t be committed.

For details on how recovery works, here is a detailed documentation.

Anthony_Lee · April 22, 2025, 7:46pm

Thanks for the confirmation about all new proxies, my confusion comes from this piece of code:

ACTOR Future<Void> checkRemoved(Reference<AsyncVar<ServerDBInfo> const> db,
                                uint64_t recoveryCount,
                                GrvProxyInterface myInterface) {
	loop {
		if (db->get().recoveryCount >= recoveryCount &&
		    std::find(db->get().client.grvProxies.begin(), db->get().client.grvProxies.end(), myInterface) ==
		        db->get().client.grvProxies.end()) {
			throw worker_removed();
		}
		wait(db->onChange());
	}
}

After checking code that creates new proxies it seems any new proxy’s interface would not be the same as any old one, so I guess it is for exact timing that above code choosing to only terminate a proxy when its interface is not in dbinfo anymore rather than anything else – the old one will be gone definitely.

Topic		Replies	Views
Is fdb's recovery may drop committed transaction? FoundationDB Core	3	666	June 30, 2021
Why doesn’t proxy get read version directly from master, but need to broadcast all other proxies? FoundationDB Core	5	1264	October 8, 2021
Proxy all-to-all communication FoundationDB Core	10	1472	July 31, 2019
Why there is no document explaining FoundationDB internals? FoundationDB Core	5	1653	November 10, 2020
Question about SIGMOD'21 paper FoundationDB Core	10	507	April 13, 2022

Why CommitProxy allows RecoveryState checking to be bypassed in commitProxyServerCore?

Related topics