FoundationDB Summit 2019: Managing FoundationDB at Scale

Speaker: @john_brownlee
Slides: Managing FoundationDB
Recording: https://www.youtube.com/watch?v=A3U8M8pt3Ks

1 Like

The Kubernetes Operator is now public at FoundationDB/fdb-kubernetes-operator.

1 Like

Is there a PDF version available of the slides? I’m not sure how to open a .key file.

PDF version here: http://jan.prima.de/u/FDB_Summit_2019_Presentation.pdf

1 Like

@john_brownlee Hi, John. I’m not a native speaker and don’t know what bounce means.

For the following slide, my understanding is: As long as one of proxies or logServers dies, we restart all fdb processes (including proxies, LogServers, StorageServer, fdbmonitor, coordinators ? who won’t be restarted ?). That’s called bounce everything at once. Right ?

Yes, that’s correct. “Bounce” in this context means “restart”, and every process in the cluster gets restarted.

You mean all processes ? Failure of a single process of any role will trigger all processes of all role bounce ? Is the strategy overly performed ? Sounds controversial indeed.

Then how long does this strategy cause unavailability ? And the upper Layer (client) can tolerate that (maybe by retrying) ?

This slide is describing deliberate restarts, such as when changing the command-line parameters for the processes, rather than failures. Failures don’t trigger restarts of the other processes, but the failure of a process in the transaction subsystem will cause a recovery, where the database recruits a new transaction subsystem. These recoveries take a single-digit number of seconds, which the clients will experience as operations blocking until the database is available again. For the case of process restarts, the unavailability period will be slightly longer, but when running through fdbmonitor that additional period can be kept small.

1 Like