Doubts regarding fdb.cluster file

Just so I don’t have to keep going back forth between here and our docs, here’s the cluster file format:

description:ID@ip1:port1,ip2:port2,ip3:port3

To connect to an FDB cluster, you must have a matching description. In some sense, the description should be used to track the progress of a single cluster through time as machines are added and removed.*

Every time the coordinators are changed, the ID should be updated as the ID uniquely identifies the coordinator set. However, if you try to connect to the cluster using an older ID (after it has been changed), the server will give you the newer ID and then the client updates its cluster file. This can be done by serving the updated file from one of the old coordinators even if they are no longer in the cluster.

The reason for this is to allow for changing the coordinators midstream without downtime. When the coordinator change happens, the update is propagated to any connected client and they update their file. Any dormant client will also pick up the change when they wake back up. The problem case happens when all of the coordinators are changed and removed from the cluster. (This might happen if, say, the cluster is moved to an entirely different set of hardware.) In that case, any client that doesn’t connect to the database between the cluster file being changed and the old coordinators being removed from the cluster will be forever more unable to connect to the cluster unless they can get the updated file from someone else.

But it’s a little more stringent than just letting the client connect regardless of the ID. For example, if you take your cluster file copy (i.e., fdb_client.cluster) and just randomly change the ID, I believe you’ll find that you can’t connect. Likewise, if you randomly change one of the coordinators in the file, you shouldn’t be able to connect even if that process is in the cluster.

So, I believe the minimal requirements are:

  1. The description must exactly match the description in the servers’ cluster files.
  2. The ID must match either the current or a previous ID used by the cluster (assuming at least one coordinator from when that ID was the current ID is still in the cluster).
  3. The coordinator set should match the coordinator set associated with the ID.

Or something along those lines.


* But if you change all of the machines, is it really the same cluster?