Upgrading FoundationDB

The administration documentation describes upgrades as taking “seconds to minutes” of downtime while the servers in the cluster are restarted with the new version. This raises a few questions:

  1. In practice, what is the strategy for upgrading large production clusters, and what is the observed availability impact?
  2. From the documentation, it seems like new and old versions are not wire protocol compatible, is this correct? As an example, some databases like Riak and ElasticSearch support doing a rolling upgrade where servers are upgraded one at a time.
  3. Is there a difference between upgrading major and minor numbered version releases?
  4. Does DR support replication between versions? For example, a common strategy for upgrading PostgreSQL is to use logical replication to the new version and then do a quick switchover, can DR be used for this?
  5. Is there any plan for a zero-downtime upgrade strategy?

You are correct that major/minor releases of FoundationDB are always protocol incompatible, and patch releases are always protocol compatible. This means that rolling upgrades are not an option between major/minor versions. Hopefully I can provide a little bit of insight into why we made this choice.

The first consideration is that the master, proxies, resolvers, and transaction logs are a unit in FoundationDB. If any process recruited as any of those roles fail, we recruit all new versions of each of them. This recruitment generally is completed in less than a second, so a machine failure does not have too big of a latency impact, but a rolling upgrade would cause a recovery per machine rebooted.

The second consideration is that because FoundationDB has so many specialized roles the protocol is very complex. Testing interactions between different versions communicating with each other would not be trivial. To begin with we would need to develop methods for deterministically running simulations across two different binaries. We obsessively tests everything, including our upgrades, and we are not going to support a feature we cannot test rigorously.

Finally, even without rolling upgrades, the latency impact of our current upgrade process is generally less than the latency impact of a machine failure. Generally ongoing client operations are delayed by less than a second while a upgrade is happening. Also, because of how rigorously we test upgrades, we do not have to worry about bringing down the database when upgrading.

Upgrades with FoundationDB happen in three steps:

  • First, load the new client library into your clients (https://apple.github.io/foundationdb/api-general.html#multi-version-client-api). This lets clients know the protocol for both the old and new version. It will attempt to connect with both versions simultaneously, so as soon as the servers are upgraded the client will automatically be able to connect.

  • Second, load the new version of the fdbserver onto the server machines. Fdbmonitor will continue running the old version, however the next time it reboots a process it will use the new binary.

  • Finally, use fdbcli to force all process all the processes in the cluster to reboot at the same time. This is accomplished with the kill command.

The result of this process is that all servers change to the new version with milliseconds of each other, and clients can connect to the cluster as soon as the processes come back online. Generally, both the transaction logs and storage servers can recover the state from disk in 100ms, and once they are done recovering the rest of the recovery process takes less than a second. The only thing to watch out for is to avoid doing an upgrade if the transaction logs have large queues. Basically, do not upgrade while the cluster is in saturation or cluster is recovering from a machine failure.

Currently DR does not work between versions, but that should be very easy to add. DR is implemented as an external process, so we just need to integrate it with multi version client api.

4 Likes

Besides what Evan said, even with adequate simulation, rolling upgrades would add a lot of surface area for bugs. Basically you have an extra database to design, build, and test consisting of (various) subsets of database A communicating with various subsets of database B. And this bullet can’t be dodged once; there’s a new opportunity to get this slightly wrong with each and every database release.

Consequently, in terms of downtime impact, rolling upgrades are picking up nickels in front of a steamroller. Let’s say pessimistically that a properly orchestrated upgrades gives you a three second latency spike, and that your SLA/the needs of your application are such that that does constitute a three second outage. And maybe there are two FoundationDB non-patch releases a year, so you have six seconds of planned downtime a year from this cause. Well, a single serious database bug that results in a subtle isolation failure and corrupts some of your higher level database invariants and requires you to diagnose manually rebuild your database from backups could pretty easily cause an hour of downtime plus data loss. We would have to deliver 1,200 flawless versions of FoundationDB with rolling upgrades (over six hundred years) to make up for the downtime caused by that one bug. I just don’t think we’re quite that good.

And “total downtime” is not a very good metric for the impact of downtime. One that I like a lot is “mean induced latency”, the amount of time that a randomly selected request arriving at your system must wait for the system to be up. By this metric the impact of an outage is roughly quadratic in the length of the outage, so your one hour downtime due to database bug is 1.4 million times worse than the three second upgrade outage…

And on top of that, if you put the effort spent on rolling upgrades into optimizing startup and recovery times, you could probably improve them a lot. It’s not like we are scraping against theoretical limits on those.

ALL THAT SAID, if you care about this metric you should AVOID THE MEMORY STORAGE ENGINE, because it has to read all data into memory before it can do anything and that takes time. A huge dataset in the memory storage engine is going to take much longer to do an upgrade, even if you do everything else right.

3 Likes

Does this step preclude the possibility of deploying FoundationDB in any sort of container? In my naive view, I’d have perhaps deployed a Docker container with the various fdb processes, including both fdbmonitor and any fdbserver processes it spawns. However, in this scenario I wouldn’t really be able to update fdbserver without stopping the entire container.

The only upgrade path left would be to stop all instances completely, pull a new the Docker container with the newer FDB processes, and start them all again. Presumably the cluster would come back up alright, but in this case I wouldn’t be able to use the fdbcli kill command and I’d likely be looking at minutes rather than milliseconds of mean induced latency.

Are there any examples of successful container-based deployments of FoundationDB by people who have worked around this problem?

1 Like

The details depend on your orchestration environment, but it should be possible to work something out. fdbmonitor is only involved to the extent that it restarts fdbserver processes.

So for example you could run individual fdbserver processes as containers, and make sure that when the container terminates a new one will be immediately started. Make sure the new container image gets pulled so that when this restart happens the new container will be started with the new image.

I’m sure it will be an adventure figuring out how to do this with various off the shelf orchestration tools.

I am trying out the upgrade process for fdb and cannot get it to work in the way @Evan described:

I cannot get this to work and I’d appreciate some pointers in where it is going wrong.

The setup: I want to upgrade a 5.1.7 cluster to a 5.2.5 cluster.

  • The clients pose no problem, just followed the documentation and the multi-version client works.
  • The upgrade using the documented way using package installs has worked, but with too much downtime. It also doesn’t feel coordinated to me from the perspective of FDB, and feels rather fragile.

I’m now trying this kill method, but end up with a broken cluster, so perhaps I’m missing some steps or are misinterpreting. This is my process:

  1. Start with some machine in a cluster: 5.1.7, config double ssd. A client is running compatible with both 5.1.7 and 5.2.5 and generating some load.
  2. Download the new binaries on each machine and extract them:

wget https://www.foundationdb.org/downloads/5.2.5/ubuntu/installers/foundationdb-clients_5.2.5-1_amd64.deb && wget https://www.foundationdb.org/downloads/5.2.5/ubuntu/installers/foundationdb-server_5.2.5-1_amd64.deb

dpkg -x foundationdb-server_5.2.5-1_amd64.deb foundationdb-server_5.2.5-1_amd64 && dpkg -x foundationdb-clients_5.2.5-1_amd64.deb foundationdb-clients_5.2.5-1_amd64

  1. I then want to update the binaries on the server. Looking at what is contained in the extracted deb packages I’m trying this:

sudo cp foundationdb-server_5.2.5-1_amd64/usr/sbin/fdbserver /usr/sbin/fdbserver && sudo cp -r foundationdb-clients_5.2.5-1_amd64/usr/include/foundationdb /usr/include/foundationdb && sudo cp -r foundationdb-clients_5.2.5-1_amd64/usr/lib/ /usr/lib/

This fails on the first copy already with the error “cp: cannot create regular file ‘/usr/sbin/fdbserver’: Text file busy”

What am I missing? Can you clarify the steps from your 2nd upgrade point @Evan please?

1 Like

Inside the configuration file that fdbmonitor uses is a field which specifies the location of fdbserver. https://apple.github.io/foundationdb/configuration.html?highlight=fdbmonitor#the-configuration-file

You load the new binary to a different location, and update the configuration file with the new binary. The next time fdbserver is restarted it will go to the new location and therefore the new version.

Hi Evan,

I have a few follow up queries:

  • What is the process of upgrading other binaries? - fdbmonitor, fdbbackup, fdbcli etc.?
  • What happens if few fdbservers restart with the new binary (due to unrelated reasons), while others are still running with old binary? Does it create any temporary or permanent issues? What exactly is behavior of fdb cluster during the time when one or more fdbservers are running with older binary versions w.r.t. other fdbservers in the cluster (if the answer depends on how far apart the versions are - it will be great to get an understanding of that as well)?


thanks,
gaurav

You can tell I am not the expert on running FoundationDB, as I think I gave you bad advice with my previous reply.

The actual answer is to create a symbolic link. So for instance you can have a directory that has folders for all the different versions of FoundationDB, and then make a “Current” folder which links to the latest one. The configuration file will use the “Current” folder. When you want to switch versions, you can change the symbolic link to a new folder.

If the server restarts for other reasons it would prematurely switch to the new version. For patch releases the new version is allowed to communicate with the old version, so the database will continue operating as usual. For major/minor version upgrades, the when the new version attempt to talk to the old version its communications will be rejected. It is equivalent to that process being down until the upgrade is finished.

Upgrading fdbmonitor will restart all of its child fdbserver processes. There is no built-in mechanism to try to synchronize the restarting of fdbmonitor, so unless you have your own that can restart them in a tight window, the best approach is to apply the upgrade in a rolling fashion. I would recommend not trying to do this at the same time that you are also trying to upgrade the fdbserver processes.

Each time you upgrade an instance of fdbmonitor that has children serving certain roles (e.g. tlogs, resolvers, proxies, master, cluster controller), the cluster will undergo a recovery. This will appear as a brief (small number of seconds) latency spike or downtime to your clients. If you are using process classes, you can optionally exclude all non-storage class processes that are children of the fdbmonitor being upgraded before killing it, which may help reduce the time spent in recovery by avoiding the need to detect the failed processes. Be sure to include those processes again after you’ve restarted fdbmonitor. It’s probably not desirable to exclude the storage servers unless you don’t intend to start the new fdbmonitor instance right away because that would move all of the data off of those storage servers.

Because of the above, we don’t recommend upgrading fdbmonitor with every release. Instead, it should only be upgraded when there are changes made to it that you need. It’s common for a release to not include any changes to fdbmonitor.

For the other binaries, you can just upgrade those at some point around/between when you’ve deployed the new fdbserver binaries and bounced them. If you are performing a major/minor version upgrade and want to minimize the interruption to these other activities, you could startup processes at the new version before the upgrade and not kill the processes at the old version until after the upgrade.

And as a side note, this happens because you can’t open files that are being executed with write permissions. If you used mv instead of cp, this should work fine.