Upgrading FDB in PROD

Can someone shared the exact steps for upgrading FDB in production to avoid downtime. First time upgrading FDB so not sure what are the steps needed to upgrade without downtime.

I was reading 7.1 manual which simply state install new libraries and do stop/start of the foundationDB service which does not explain how to avoid downtime for both server and the application. Please note my current install fdbserver path is /usr/sbin/fdbserver (not version specific path).

I came across link Upgrading FoundationDB · apple/foundationdb Wiki · GitHub (last updated in 2019) which says work on multi-client upgrade steps first and then on server upgrade . Is these steps to upgrade in production to avoid downtime is necessary ?

For server upgrade it says “Install the new fdbserver binaries alongside the old binaries, with each binary in a path that contains its version.” but I am not clear how to install FDB server and client in different path every time version change on Linux. I checked rpm command with prefix option but not sure if thats the one i need to use. If anyone can share the example that will be great.

I assume 2nd step listed is asking to change the following path in the configuration file as per version needed as we are running the fdb monitor.
[fdbserver]
command = /usr/sbin/fdbserver

You can consider downloading individual binaries from GitHub. Previously, we had to extract out the binaries from rpm/deb package, but now its much easier.

You can even automate this. See here and here for how I’ve do it with Nix. You could do something similar with your automation tool.

Here is a gist from notes. There are footnotes to relevant forum posts.

1 Like

Thanks Rajiv … I am documenting the steps as per info provided so please keep an eye on this thread as I will need your help to review those steps … I will update the thread soon

Not clear on the Symlink I need to create to use the new FDB version (before fdbcli kill all command) if I download individual binaries.

If I downloaded the binary as follows , I am clear that I can set the symlink for fdbserver and fdbcli but I am not clear on how to change it for fdbmonitor , fdbbackup , fdbrestore, fdbdr . Also not clear what change/symlink needed for libfdb_c.x86_64.so .

Download binaries:
https://github.com/apple/foundationdb/releases/download/7.1.9/fdbbackup.x86_64
https://github.com/apple/foundationdb/releases/download/7.1.9/fdbcli.x86_64
https://github.com/apple/foundationdb/releases/download/7.1.9/fdbmonitor.x86_64
https://github.com/apple/foundationdb/releases/download/7.1.9/fdbserver.x86_64
https://github.com/apple/foundationdb/releases/download/7.1.9/libfdb_c.x86_64.so

Link for fdbserver and fdbcli:-
ln -s /opt/foundationDB/7.1.9/fdbserver.x86_64 /usr/sbin/fdbserver
ln -s /opt/foundationDB/7.1.9/fdbcli.x86_64 /bin/fdbcli

I do see the following executable when RPM install was used . I can use symlink for /bin/fdbbackup but not sure how to update /bin/fdbrestore and /bin/fdbdr.

-rwxr-xr-x 1 root root 11987128 Mar 2 20:01 /bin/fdbcli
-rwxr-xr-x 1 root root 16016800 Mar 2 20:01 /bin/fdbbackup
-rwxr-xr-x 1 root root 16016808 Mar 2 20:01 /bin/fdbrestore
-rwxr-xr-x 1 root root 16016800 Mar 2 20:01 /bin/fdbdr

Also backup configuration is using “command = /usr/lib/foundationdb/backup_agent/backup_agent” so how this executable will get updated or it will call /bin/fdbbackup which is softlink to new fdb version?

For fdbmonitor, I do see the service file script entry as follows so do i need to create symlink for /usr/lib/foundationdb/fdbmonitor pointing to new version . As per document it looks like I dont need to upgrade fdbmonitor everytime but if I need to do i can create symlink. Also do i need to do stop/start service on all nodes if symlink is changed to upgrade fdbmonitor (which i think i do )

ExecStart=/usr/lib/foundationdb/fdbmonitor --conffile /etc/foundationdb/foundationdb.conf --lockfile /var/run/fdbmonitor.pid --daemonize

My current running process looks like below:-
root 2592 1 0 Jun14 ? 00:00:00 /usr/lib/foundationdb/fdbmonitor --conffile /etc/foundationdb/foundationdb.conf --lockfile /var/run/fdbmonitor.pid --daemonize

foundat+ 2601 2592 0 Jun14 ? 00:02:50 /usr/lib/foundationdb/backup_agent/backup_agent --cluster_file /etc/foundationdb/fdb.cluster --logdir /data/foundationdb/log

foundat+ 2602 2592 0 Jun14 ? 00:10:29 /usr/sbin/fdbserver --cluster_file /etc/foundationdb/fdb.cluster --datadir /data/foundationdb/data1 --listen_address public --logdir /data/foundationdb/log --public_address auto:4500

foundat+ 2603 2592 1 Jun14 ? 00:23:34 /usr/sbin/fdbserver --cluster_file /etc/foundationdb/fdb.cluster --datadir /data/foundationdb/data2 --listen_address public --logdir /data/foundationdb/log --public_address auto:4501

I am not familiar with fdbbackup and fdbrestore.

If you happen to have podman, you can run my CI container.

sudo podman pull ghcr.io/fdb-rs/fdb-7_1_3:3205cb0f188a

sudo podman run --rm --name fdb --tty ghcr.io/fdb-rs/fdb-7_1_3:3205cb0f188a

The above command should launch a container running FoundationDB under systemd.

You can then login to the container as root.

sudo podman exec --tty --interactive fdb /bin/bash

I do not manage FDB binaries in /usr. In /opt/fdb you will find my current organization.

Let me elaborate my setup a bit more.

  1. /opt/fdb/cli/7.1.3/fdbcli is the fdbcli binary. The main reason for organizing this way is because if the cluster needs to be upgraded to lets say 7.2.0, then 7.1.3/fdbcli would no longer work. So, in that case, you would want to introduce /opt/fdb/cli/7.2.0/fdbcli prior to the upgrade. That way you will have a way to connect to the cluster after the upgrade.

  2. /opt/fdb/client-lib contains the client library. This is the library against which the FDB client application would link. In Tokio/Rust you can use the option RUSTC_LINK_SEARCH_FDB_CLIENT_LIB, to link to libfdb_c.so present in this directory.

  3. /opt/fdb/client-lib-dir/ contains additional client libraries. So, prior to cluster upgrade, I would introduce new versions of client libraries in this directory. This could be as simple as cutting traffic and rolling out a new container or VM. The application would make use of EXTERNAL_CLIENT_DIRECTORY feature, so the moment it detects a cluster upgrade, it will switch to the newer version.

  4. /opt/fdb/server/7.1.3/fdbserver is the fdbserver binary. Similar to fdbcli, newer version would go into /opt/fdb/cli/7.2.0/fdbserver.

  5. /opt/fdb/monitor/fdbmonitor contains fdbmonitor. As you can see there is only one fdbmonitor binary that is required.

  6. /opt/fdb/{data,log} contains data and logs.

  7. /opt/fdb/conf/foundationdb.conf contains the configuration file.

You would not upgrade fdbmonitor. It would get rolled out as a part of your image.

If you do systemctl status foundationdb.service in the container you will see the following.

CGroup: /system.slice/foundationdb.service
        ├─13 /opt/fdb/monitor/fdbmonitor --conffile /opt/fdb/conf/foundationdb.conf
        └─16 /opt/fdb/server/7.1.3/fdbserver --cluster_file /opt/fdb/conf/fdb.cluster --datadir /opt/fdb/data/4500 --listen_address public --logdir /opt/fdb/log --public_address auto:4500

So, systemd started fdbmonitor, which in turn read foundationdb.conf and started the fdbserver process.

When upgrading, you need to rollout a newer version of foundationdb.conf. This can be done by manipulating symlinks. (Please be aware of kill_on_configuration_change option, which by default is true. You might want to set this to false, so you don’t accidentally restart fdbserver process).

Once the newer version of foundationdb.conf is setup, you can then bounce the cluster. This would only restart the fdbserver process and clients would adjust to the newer version.

I hope this helps!

Thanks for details . Trying to digest this information and following is my understanding . I am not using Docker/pod and all are physical servers in our case. Also original install on all PROD are done using RPM install method (wish first time version specific libraries were used but its too late as it was done in the past).

Please correct if you think wrong anywhere in these steps. I will work on automation later once i test these steps manually.

**Step 1 upgrade Client Servers **

Before upgrading the server do the following setup on Client machine so client can connect to both current version and then after upgrade without any downtime (some downtime till all server process comes back after kill) to the new version.

a) Download
https://github.com/apple/foundationdb/releases/download/7.1.9/libfdb_c.x86_64.so to
/opt/foundationDB/multiversion-client/libfdb_c_7.1.9.x86_64.so

b) cp /usr/lib64/libfdb_c.so to /opt/foundationDB/multiversion-client/libfdb_c_6.3.24.x86_64.so --this is
current version which was installed using RPM . In future I will not need this step as step a will take
care of multi version support.

c) set environment variable
FDB_NETWORK_OPTION_EXTERNAL_CLIENT_DIRECTORY=/opt/foundationDB/multiversion-client

For JAVA application they can set fdb.options().setExternalClientDirectory(...) instead of FDB_* 
variable setting. 

d). Bound client application .

e) check cluster.clients.supported_versions using json status from DB server and client and protocol version and Ip address connected.

Going forward we will not use RPM and /usr/lib64/libfdb_c.x86_64.so and keep two versions of the client current and new one . Client might need to upgrade language specific upgrade to be compatible with new version.

Step 2 Upgrade Server

a) Set kill_on_configuration_change=false in the conf file.

b) Download following binaries to /opt/foundationDB/7.1.9/

https://github.com/apple/foundationdb/releases/download/7.1.9/fdbbackup.x86_64
https://github.com/apple/foundationdb/releases/download/7.1.9/fdbcli.x86_64
https://github.com/apple/foundationdb/releases/download/7.1.9/fdbmonitor.x86_64
https://github.com/apple/foundationdb/releases/download/7.1.9/fdbserver.x86_64

c) Create Softlink for the following binaries
ln -s /opt/foundationDB/7.1.9/fdbserver.x86_64 /usr/sbin/fdbserver

d) Connect to fdbcli which will be 6.3.24 version (current version) and run kill; kill all; status

e) ln -s /opt/foundationDB/7.1.9/fdbcli.x86_64 /bin/fdbcli --this will make fdbcli latest version . Connect again using fdbcli and check the status of the cluster.

f) ln -s https://github.com/apple/foundationdb/releases/download/7.1.9/fdbbackup.x86_64 /bin/fdbbackup

g) At this time if cluster and application both running fine and fdbcli --version shows latest version then we can decide to upgrade fdbmonitor at this time by softlink

 ln -s /opt/foundationDB/7.1.9/fdbmonitor.x86_64 /bin/fdbcli /usr/lib/foundationdb/fdbmonitor 

h) Rolling bound “sudo service foundationdb restart” making sure enough services are available to
process the application workload without downtime"

I still need to find out

  1. How to upgrade or where to get the binaries for fdbrestore and fdbdr
  2. Also not sure if upgrading fdbbackup is sufficient to take backup using new version as configuration file is using “command = /usr/lib/foundationdb/backup_agent/backup_agent” so how to upgrade “backupagent” with the new version.

I found RPM install to be conceptually lacking with operational needs of running a FDB cluster.
fdb-kubernetes-operator
seems to provide a much better tooling for running FDB. Even though we have no plans of running kubernetes, studying its source was helpful to me.

Its just my opinion, but maybe you could consider first migrating prod into a non RPM based setup and then initiate the cluster upgrade.

In any case, please test your upgrade process both on the client and server side in staging.

thanks for the help . I will create new questions to see if anyone know about fdbrestore,fdbdr and backup
_agent as without it it looks like it will not be upgrading all the binaries needed. One option is after DB upgrade using above method i can run RPM upgrade to get the latest fdbrestore/fdbdr and backupagent provided rpm upgrade dont remove the link created in the above test.

I will test this in my playing cluster anyway. Again thanks for your help.

@chetan_pg We are also planning upgrading our PROD clusters from 6.2.x → 7.1.x. Thank you for summarizing steps above. Is there anything else we should be careful about? Did you run into any issues?

fdbrestore, fdbdr, and backup_agent are the same binary as fdbbackup. They are just named differently. When running, the process uses the binary name to decide what to do accordingly.