Process class and machine sharing deployment questions

Before answering any of the questions, I want to define some vocabulary that I’ll be using. A role is a particular job in the cluster that can be assigned to processes, such as the storage server, transaction log, proxies, etc. Processes can take on more than one role.

A process class is a property of each process which determines its affinity for taking on certain roles. Depending on the class and some other factors, a process may or may not actually be assigned any roles. @guarav put together a useful matrix for how process roles can be assigned to classes: Roles / Classes matrix

There shouldn’t be any safety concerns with running your processes with an unspecified class. There also shouldn’t be any safety concern with not specifying your desired number of various roles. The reason to set both is for performance reasons.

If you don’t set the desired number of various roles, your throughput will be limited to whatever the default number of each role can handle. If this is enough throughput for you, there shouldn’t be an issue, but if you start running into a bottleneck with one or more of the roles, you’ll need to request more.

If you don’t set process classes, there are a couple different things that can happen. I believe that one is that you will recruit a storage server on every process, and other roles will be recruited alongside them on the same processes. This means that some of your storage servers will be dedicating some of their CPU resources to other tasks, reducing your storage server capacity. You also could be subjected to various spillover effects if you end up saturating a colocated role. For example, if your read workload is too hot on a storage server that is shared with a proxy, you’ll end up starving the proxy and interfering with the commit path. If your cluster is lightly loaded, I wouldn’t expect this to be a big deal, but otherwise it’s best to avoid a configuration like this.

Another issue is that when transaction logs and storage servers share a disk, you’ll get suboptimal performance. The two roles have very different disk use patters, with the logs performing frequent small fsyncs, and the storage servers doing lots of reads and less frequent fsyncs. When both are using the same disk, you’ll likely see increased latencies from the interference of the two.

In both cases, I’m not certain of the magnitude of the performance loss, but I’d recommend doing a performance test if you’re considering a non-trivial cluster with an unset layout. I do know that in the first case, having co-located roles makes things a lot less clear when trying to diagnose issues.

Specifying the number of each role desired is slightly orthogonal to assigning classes to processes for some of the reasons discussed above. Choosing the number decides what capacity the cluster has for work of the given role type, and you’ll need to do this to scale up if your cluster has reached a bottleneck on a role.

The cluster has a minimum requirement for each role (in some cases, 1). For roles like the transaction log, there is also a requirement that there be enough fault tolerance zones (i.e. separate machines unless configured otherwise). If you don’t meet the minimum requirements, then your cluster will fail to startup and be unavailable. Assuming you meet the minimum requirements, the cluster will recruit as many of each role as it can up to your desired amount while meeting various rules about recruitment.

There’s another fault tolerance related issues that can arise in some configurations that’s worth mentioning here. If you request N transaction logs and you run N transaction class processes (or probably if you have only N processes of unset class), then your cluster will be at risk of availability loss if one of the transaction log processes fails in a way such that it is repeatedly disconnecting and reconnecting to the cluster. What happens in this case is that each time the process disconnects, it will cause the cluster to go through a recovery, and then when the process rejoins, it will recover again to add back the transaction log. These recoveries can take a few seconds, and if frequent enough you won’t be able to get much done.

For this reason, it may be of benefit to run 1 or 2 extra processes (depending on your desired fault tolerance) of each non-storage class, preferably on different machines. For example, a good configuration for a triple redundant cluster would be to have N+2 transaction class processes and M+2 stateless processes, where N is the desired number of transaction logs and M is the number of processes running stateless roles such as proxy, resolver, etc.

Yes, and I think there are two ways to accomplish this. One is to exclude the process that you want to do this to, as a coordinator is the only thing that can run on an excluded process. The better option is to use the coordinator class, which I don’t think is in the matrix above but I believe is only allowed to be a coordinator.

FoundationDB doesn’t attempt to do anything with processor affinity, so it’s up to the OS to decide where processes run. There may be some benefit to trying to configure the affinity, but I’m not sure how much to expect. One thing you may want to be careful about is running too many processes for the number of physical cores on the machine. If you manage to saturate the CPU on the entire machine, you’re likely to get some undesired behavior, such as high latencies or processes being detected as failed, etc.

1 Like