Feedback on New Deployment Topology

I’m interested in shrinking our minimum cluster size for new clusters and adding more fault tolerance. Today we deploy in a topology like this:

three_data_hall mode
5x pods running coordinator class
9x pods running unset class
1x pod running stateless class

All evenly spread across AZs in Kubernetes (not using the operator) using stateful sets for the coordinator and unset and a regular k8s deployment for stateless.

We’d like to switch to this topology:

three_data_hall mode
9x pods, all pods run unset class

We then change our automation that replaces dead coordinators to enforce that when adding more unset pods, coordinators can only run on pod ordinals 0-8. So all ordinals 9+ can run anything but a coordinator.

Is this is a reasonable strategy? The benefits to me seem pretty clear, which is that we get 9 coordinators at a lower cost than the previous deployment strategy. From what I can see in the code, there shouldn’t be any issues with running a coordinator in the same process as a TLog because of the event loop priorities making coordinator requests take priority over most everything else. Additionally, coordinators and TLogs are not really active at the same time.

Are there any downsides to this?

As you seem to know (and we should document this better), you should run with 9 coordinators if you use three_data_hall.

Running everything with unset is something that will simplify your topology and FDB will try to place the roles in a reasonable way. There are some drawbacks to this strategy though:

  1. Every process will need a disk. You won’t know which processes will be recruited for storage/tlog and which will be stateless. If you have disks everywhere anyways, this might not be a problem. But if you use something like EBS, this could be very costly.
  2. Every process will need the same amount of memory. This means, for example, that you can’t have high memory instances for storages and cheaper instances for stateless and tlog roles.
  3. There’s a good chance that FDB will co-locate tlogs and storages in the same process (I am not 100% whether this is true).

Yes, I am aware 9 is the intended number of coordinators for three_data_hall.

  1. The net reduction of nodes here will save enough to make this a non-issue, but that could definitely be an important consideration for other users.
  2. Our deployment model is “many small clusters”, so we’d probably just add more clusters instead of pursuing that kind optimization.
  3. This already happens in our current deployment, which is annoying in that sometimes resource utilization is not balanced, but it hasn’t been an issue otherwise.

Thanks for your help!

Any chance you’d be able to detail the motivations behind or requirements that drove this?

Our workload is fairly easy to partition across multiple clusters and we generally prefer the strict isolation this provides. This isn’t really related to FDB at all because we deploy lots of things this way.