FoundationDB

Is it safe to have multiple cluster handles in the same process that use the same cluster file?


(Christophe Chevalier) #1

I have a process that has two independent modules that can connect to either the same fdb cluster, or two different clusters depending on configuration (one would be for general purpose data, and the other for centralized logging).

The implementation of each module is completely separate and they don’t share any code (apart from the fdb binding), so they currently create their own Cluster and Database handle, and read the cluster file path from two different configuration sections.

In the case where both target the same fdb cluster, they will both have the same cluster file path. So if the list of coordinators change on that cluster, both FDBCluster instances may want to update the cluster file “at the same time”.

Looking at the client code, it seems that monitorLeaderOneGeneration(..) in MonitorLeader.actor.cpp is listening for leader change notifications, and will call ClusterConnectionFile::setConnectionString(..) with the new string. This will create a tmp file with the new cnx string, and swap it with the old .cluster file “atomically”. It then does check the content of the file to see if it matches what was written previously.

I was not able to confirm this or not, but if I create two cluster handles, they will both call monitorLeader(...) so both may respond to the same event and independently try to update the same cluster file.

Since both cluster handles will share the same network thread, can I safely assume that since the fdb client is single threaded, then both actors won’t try to update the same file at the same time (concurrently) ? At worst, the same file will be overwritten twice with the same content.

What would happen if the cluster file changes twice in very rapid succession, both the leader monitor actors fire but in some weird order that would end up with the intermediate version of the cluster file on disk?

Is this a scenario were having multiple of a Cluster handle per process with the same cluster file is a bad thing? Should I try to dedup this at the application level? or maybe even at the binding level? (ie: if second cluster use same path as first one, reuse this handle but create a second database handle, to at least have some isolation between both modules)


(David Scherer) #2

Sharing the cluster file is definitely the norm - for example multiple fdbserver processes share the file in a typical server installation. But a bug here is not inconceivable, because for example the simulator probably can’t simulate fine-grained concurrency in this code (atomic replace is probably “atomic” to it). So remain vigilant, I guess. I don’t remember the relevant code well enough to answer your specific questions with confidence.


(A.J. Beamon) #3

I do believe that it is correct that the cluster file is updated on the network thread, and the atomic replace code isn’t concurrent (i.e. it doesn’t yield, so no other work on the network thread could be done in the middle of it). As a result, I wouldn’t expect that you’d have to worry about simultaneous filesystem operations to the cluster file from a single process.

One possible exception to this may be when using the multi-version client. Each version of the client has its own network thread, and though I don’t think you’ll see updates to the cluster file from each of them normally, it might be possible to have multiple updates simultaneously if a coordinator change is timed well relative to an upgrade.

I agree with Dave’s thoughts regarding multiple processes using the same cluster file. It should be noted that if the cluster file ends up with the wrong contents, then the process(es) using it will report this fact to the cluster and you will see a message in status indicating so.

One issue that can come up with multiple processes is in the following scenario where coordinators are changed multiple times:

  1. Process 1 updates cluster file to A
  2. Process 1 updates cluster file to B
  3. Process 2 updates cluster file to A
  4. Process 2 dies

If process 1 and process 2 share the same cluster file, you could end up with the wrong contents at the end. As described above, process 1 should report to the cluster that its cluster file is incorrect. If process 2 reconnected, it would update the cluster file so long as some process from coordinator set A was still in the cluster.

The above scenario could probably happen in a single process case as well, but it would likely involve the death of both clients since they share a process. In that case, it wouldn’t be much different than if you completed step 1 and died immediately afterward.


(Christophe Chevalier) #4

Thanks for the explanation.

It looks to me that best practice when having multiple process running on the same host, is to have one instance of fdb.cluster for each process, instead of all using the global default path.

For different app, they could have their own fdb.cluster in their working directory (they could target different clusters anyway). For the same app spawning multiple concurrent process, then either each has its own copy of the file, OR they share the same one and we hope for the best ?