I could do with an ELI5 (Explain it Like I’m 5 years old) explanation on upgrading a globally distributed FDB cluster,
not a large one, but a redundant one, encompassing both server side, and client side.
The specfic situation is from 7.1.29 to 7.3.33, but the general process is what interests me. I’ve read all the forum posts, docs, notes, gists, and it’s not clear to me yet.
There is a lot of discussion around server-side upgrades, which I can summarise as:
- do lots of prior testing
- deploy the new binaries everywhere all at once
- bounce all the servers very quickly, especially across multiple regions
- update your backups ASAP too
- hope that this works just fine and there is no fallout
What am I supposed to do about clients? My understanding is that there are two “protocols” in play:
- the
API
version, such that a client can request earlier version compatibility - the
fdbcli
version, the network protocol, which must always match precisely to the servers - the C client is apparently capable of choosing the .so dynamically to match the server, if multiple versions are installed
After reading all this I do not understand how I should approach upgrading
the overall infrastructure - servers & clients.
It seems I should:
- have packages for both old & new FDB versions for client & server available
- on all clients, deploy the new
libfdb_c.so.7.3.33
alongside existinglibfdb_c.so.7.1.0
so that it can use the appropriate connections as needed, and let them restart, they will continue using the older versions initially - on all servers, deploy the new binaries, bounce all servers as fast as practical
- over time, upgrade clients to the new API version, to match the server version
Have I got this right?
Are there any tricks to simplifying this, or reducing the risk, or allowing more time during the upgrade?
Overall, this process feels very fragile, with many points where a large server and client fleet must be upgraded in step, restarting basically everything in the process, and in very short duration.
I am not sure that, operationally, I could deploy such an infrastructure, that could be rolled back if required, without significant downtime.
What I would prefer:
- older clients (6.x and 7.x) can seamlessly connect to a newer server
- cluster upgrades should be incrementally possible
Is that possible, somehow? Does that change for a multi-region setup?
[edited]