Operating FoundationDB in an elastic environment

(Daniel Dunbar) #1

Is there any guide or information on how to operate FoundationDB in an elastic environment (or the degree to which this is currently possible)?

The impression I get is that FDB derives its notion of identity from the entries in fdb.cluster, and those entries are expected to be constant across both the cluster and the clients. If FDB is living behind some kind of proxy/service mesh, can the client have a different view of the cluster’s IP/ports than the cluster itself (the answer seems to be no when I try it)?

I realize there are substantially more issues than just this to be concerned about when FDB is running in an environment where nodes are expected to be far more ephemeral than one might want for a database… at this point I’m mostly curious as to whether it is even possible and then how well it can possibly work.

(matthew zeier) #2

At Wavefront we have effectively three Fdb tiers. One is statically defined as a KV store and tends not to grow.

The other two are dynamic and grow/contract based on how we consume telemetry. This is elastic - but not automatically elastic - and we’ve built tooling to support that, primarily Terraform & Ansible.

Hosts are launched via Terraform using pre-built AMIs (via Packer/Ansible). These AMIs are “instance type” based and derive their disk layout and /etc/foundationdb/foundationdb.conf largely as a function of the instance type. An i3.16xlarge vs. an i3.4xlarge will have a different set of ssd/mem processes as a function of the memory/CPU footprint. Disks are likewise built as a function of the instance type (and specifically number of NVMe instance stores) + data volume size.

These AMIs are stateless. Terraform plumbs in instance tags and on instance launch, we have two separate processes that define state and join this new host to the cluster:

  1. @reboot cronjob that runs ansible-pull. Hostnames, telegraf, central logging and volume tagging is done here. This is mostly per-host configurations that we do with Ansible.
  2. An init.d script runs on boot that, among other things, adds this host into Wavefront (for monitoring) and pulls the cluster files (/etc/foundationdb/fdb.cluster) down from S3. We also run a memory version on the same host so the memory cluster file is also pulled down.

There are secondary processes that push changes to cluster files (fdb.cluster and, in our case, memory.cluster) back up to S3 so any coordinator changes are more or less automatically kept in sync. This is true for all their Fdb the, including the statistically defined KV store (so we can handle hard instance replacements somewhat automatically).

We actually do a post-launch conformance so while Fdb could start on launch - and used to - the AMIs now have Fdb installed but disabled on boot. The Ansible conformance job does many things but that thing it can do is start foundationdb and enable it on boot.

At our scale, it’s some time disruptive to auto-join on a large cluster. So we will have Terrafrom drop all the new instances, conform and then we can join.)

We contract (or replace all instances with larger/smaller) as SOP. This has been reduced to four Ansible commands that:

  1. Tag all the existing instances with a specific EC2 tag
  2. Using those EC2 tag, re-coordinate and exclude (we do this twice, once for the ssd tier, once for memory)
  3. Disable instance termination protection, stop “old” instances, validate we aren’t hard down (!) and eventually terminate the excludes nodes and issue an “include all” to the cluster.

(Alex Miller) #3

And to answer the one explicit question:

Not really. Particularly for storage servers, their IP address is written into the database, and that’s how the client learns of their existence and how to connect to them. So if a storage server thinks it has one IP address, and the client would need to use a different one to connect, then clients will never be able to connect.

(Daniel Dunbar) #4

Thanks for the details @mrz. Sounds like from this and Alex’s answer that the way to do this requires coordination/automation like you have built to maintain the fdb.cluster and foundationdb.conf files and re-distribute them as the instances change.