I’m interested in shrinking our minimum cluster size for new clusters and adding more fault tolerance. Today we deploy in a topology like this:
5x pods running coordinator class
9x pods running unset class
1x pod running stateless class
All evenly spread across AZs in Kubernetes (not using the operator) using stateful sets for the coordinator and unset and a regular k8s deployment for stateless.
We’d like to switch to this topology:
9x pods, all pods run unset class
We then change our automation that replaces dead coordinators to enforce that when adding more unset pods, coordinators can only run on pod ordinals 0-8. So all ordinals 9+ can run anything but a coordinator.
Is this is a reasonable strategy? The benefits to me seem pretty clear, which is that we get 9 coordinators at a lower cost than the previous deployment strategy. From what I can see in the code, there shouldn’t be any issues with running a coordinator in the same process as a TLog because of the event loop priorities making coordinator requests take priority over most everything else. Additionally, coordinators and TLogs are not really active at the same time.
Are there any downsides to this?