FoundationDB

How to prevent fdbcli from monitoring the connfile?


(Caleb Spare) #1

Since I’ve started using FoundationDB one of the most idiosyncratic operational characteristics is how it wants to watch the configuration files and respond to changes. This has surprised me a few times, but today I really got thrown for a loop.

A cluster I was using started giving this warning every time I connected:

$ fdbcli -C fdb.cluster
Using cluster file `fdb.cluster'.

The database is available, but has issues (type 'status' for more information).

Welcome to the fdbcli. For help, type `help'.
fdb> status

Using cluster file `fdb.cluster'.

1 client(s) reported: Cluster file contents do not match current cluster connection string. Verify the cluster file and its parent directory are writable and that the cluster file has not been overwritten externally.

Configuration:
  Redundancy mode        - triple
  Storage engine         - ssd-2
  Coordinators           - 5
...

I checked all the fdbcli processes and configuration everywhere (or so I thought) and I was really pulling my hair out after about 45 minutes. Eventually I found that status details gave me an IP address and then I figured out that another laptop I’d run fdbcli on still had that session open (but I’d changed to another git branch in the meantime, removing the fdb.cluster out from underneath fdbcli).

This behavior is really surprising for two reasons:

  • If I run some software with a configuration file that tells it what servers to connect to, I don’t expect that it will monitor that file after startup and care if it changes.
  • I don’t expect that the particular state of some fdbcli process on some laptop somewhere causes a warning to appear for everyone who connects to the cluster (as though there’s some problem with the cluster itself).

So, two questions (or maybe feature requests):

  1. Is it possible to run fdbcli in a way that loads the config once and then doesn’t monitor the file? I tried tricking fdbcli this way but it didn’t work:

    $ fdbcli -C <(cat fdb.cluster) 
    Unexpected error loading cluster file `/proc/self/fd/11': 1516 File too large to be read
    
  2. Is there any way to suppress warnings caused by fdbclis that are connected to the cluster? If we end up deploying FDB at my company I imagine that the “Cluster file contents do not match current …” warnings will be semi-permanent due to weird fdbcli connections.


(A.J. Beamon) #2

This is done because the processes are trying to keep these files up-to-date with respect to the current connection string. If you change the coordinators of your cluster, for example, then every process will update their cluster file. If the file isn’t current and the process restarts, it may not be able to connect to the cluster upon restart.

A common cause for getting this error is if the cluster file isn’t writable. Another is if the file gets changed by an external process. The latter can be a problem for multiple reasons. One is what I said above – that having an inaccurate cluster file can prevent the process from reconnecting on restart. The second is that a change to the coordinators in the cluster will cause the process to overwrite the file. If you were intending to intentionally change the cluster file externally, your process may actually end up undoing your work. Additionally, if the cluster file was changed in a way that connects to a different, unintended cluster, then there could be a variety of negative consequences if that process restarts.

The status command is intended to give an overall picture of health and operation of the cluster. Based on what I described above, we’ve considered a cluster file problem to be indicative of some sort of configuration or administrative problem that can lead to bad consequences. In that light, we want to make sure that this is apparent in the cluster’s status and can be corrected. fdbcli is basically like any other client, so it behaves similarly.

There’s a similar issue when upgrading that can arise from having a process left connected to the cluster like this. If you are using the multi-version client, it’s usually a good idea to check that all connected clients are properly prepared for the upgrade before going through with it. I believe an orphaned fdbcli will appear as an older client and you’d have to hunt it down in that case too.

I don’t know of a way to do so currently. It’s possible this topic could fold into the broader discussion about the future of cluster files and connection strings (see https://github.com/apple/foundationdb/issues/567 and various topics on the forums), but keep in mind one thing you may want to do with fdbcli is change the coordinators, and in that case it’s often helpful that it then writes the new cluster file for you.