Questions about process classes, recruitment behavior and cluster configuration

(mehmet onsiper) #1

Hello! We have been experimenting with FDB for a while and would like to use it as our primary DB. It seems like it’d be greatly beneficial to have at least an elementary understanding of how user-specified process classes and automatic recruitment work together. I have a few questions to clarify my understanding, and also would like to validate some rules-of-thumb I extracted out of prior discussions in this forum. So here goes:

  1. Can a process with an assigned class (i.e. not unset) still get recruited to another class if there is a need? My understanding is that user-specified assignments are more like guidelines, and can be overridden in run-time if necessary. Is that correct?

  2. This is a follow-up question to the above. Let’s say I have a host with a single storage device and two processes whose classes are transaction and stateless. Does such an assignment imply that this host will not store any data (modulo transaction logs), or does it simply state that this host prioritizes transaction log processing over data storage? If the behavior is the former, one would need to take this fact into account when thinking about replicas and such.

  3. Does it ever make sense to configure a host so that num processes > num cores? In such a case, my understanding is that extra processes will mostly stay idle and only do meaningful work if the first num cores processes are all stuck waiting on things. Is that even plausible? I’d think processes would refrain from synchronous waits so that such a situation is avoided. Therefore, I’m inclined to use num processes == num cores. What am I missing?

  4. After studying some discussions on this forum, here are the rules-of-thumb I extracted to come up with a reasonable “default configuration” for our FDB clusters. I’d be glad if you could eye them and let me know if any of them doesn’t make sense.

  • num disks = num storage processes + num log processes for each host
  • num total proxy processes = num total log processes
  • Proxy and log processes are not on the same host (so that they don’t compete for BW)
  • Storage and log processes do not use the same storage device (so that they don’t compete)
  • When using SSD storage devices, num of storage processes / num of log processes should roughly be around 8.
  • Any extra processes that we don’t know what to do with should be of the class stateless instead of unset so that they do not get auto-assigned to storage.

I know this is a long post… thanks in advance for all the responses; they will be greatly appreciated.

(Alex Miller) #2

The manual assignments of process classes is largely to work around the fact that FDB can be terrible about how it recruits roles in your cluster by default. It’s a thing that I’d really like to see improved sometime, but no one is actively working on it.

I’d honestly just recommend looking at the code for this. Some things can be overridden in a time of great need, some can’t.

It would imply that the host would only store transaction logs.

Overall, I’d recommend against it, because you’ll probably just see an increase in latency for no real gains. There’s nothing that stops FDB from recruiting more than one role for a given process, and it’d likely be a bit better to have two roles in one process than two processes on one core. The only reason for concern that I would have is that they’d then share the process memory limit, but as FDB runs fine entirely out of a single process, that’s not a large concern that I’d have.

You’ll find that it’s sometimes advantageous to run >1 storage servers per disk, because one alone won’t be able to saturate the disk IOPS. So maybe more like num disks = num SS processes / SS per disk + num log processes. (Assuming that you have more cores than disks to support the extra processes.)

When using ssd storage engine, storage/log = ~8, on physical hardware that’s vaguely around what I’ve previously benchmarked. Memory storage engine has a ratio of about 2.

Also make sure that you use log instead of transaction as the process class for transaction logs. transaction is basically proxy+log, which I’ve only not pushed for its removal because it’s rather convenient when measuring proxy vs log ratios so that you don’t have to constantly change your process classes.

I’ll also add that your number of logs and proxies should probably be roughly equal, as benchmarks I’ve done have showed diminishing returns from deviating from the two being equal. This is quite possibly workload dependent though.

Use the minimum number of resolvers that you can. 1 is ideal, 2 if it measurably improves your workload, 8 is right out.

Having some excessive stateless processes is always convenient. If you’re looking into multi-dc, remote regions can be relatively stateless process hungry, because log routers can be recruited onto stateless processes if you haven’t allocated them their own processes. In 6.1 and beyond, you’ll see more things that used to be combined moved off into their own recruitable roles (e.g. Data distribution in 6.1, and ratekeeper will probably follow), both of which will end up in stateless unless otherwise told.

(mehmet onsiper) #3

So, in a cluster with identical nodes that have a single storage device, at least one node will need to have log and storage processes running together and competing for the same device; unless one is willing to have that node/device just doing transaction logs. However, such an allocation doesn’t make a lot of sense to me – it sounds like we’d be leaving significant storage space unused. Am I thinking correctly?

BTW, the link you gave was really helpful! We will probably start out with a small cluster with one log process, which worried me about having a single point of failure. However, looking at the code, I see storage nodes can be recruited (even though it is a ProcessClass::WorstFit) for this purpose if push comes to shove. That should preserve availability until the crashed node comes back, right?

One final question: Do processes have a single class at a given time, or can they do two things at once? Take the above scenario as an example. Assume that the host housing the log process goes down, and that a storage process on some other node is recruited for this purpose. Does the auto-recruited process do both data storage and transaction logs now, or does it stop doing data storage? What happens when the crashed node (and its log process) comes back? Does the auto-recruited node go back to being storage (or storage-only)?

Thanks for the prompt and insightful reply!

(Alex Miller) #4

Unfortunately, disks have a fixed amount of IOPS and space that you purchase together. A transaction log is disk-time hungry, but (normally) rather unconcerned about the amount of space it has. There is a particular case though, if a storage server dies and its responsibilities are being moved or if you’re in a multi-region setup and the remote region dies, where the amount of space a transaction log has to buffer quite a large number of commits would become of concern.

If you’re running on SSDs, then it’s also a plus that the drive is mostly empty, because it means the disk can efficiently wear level itself, and transaction logs do a lot of writes.

One log would mean single replication, so you’d lose your database if you lose one node. Perhaps I’d suggest using double replication and two logs? Then if you lose one log, you would end up having a log recruited alongside a storage server until the failed log-class process comes back.

FDB processes start up a “worker” that can spawn the zero or more of the actors that make up an FDB database. A role is a defined responsibility that is a part of FDB, e.g. proxy, log, storage server, etc. A process class is a preference of what roles we’d like to have recruited on that given process. FDB is free to recruit more than one role per process if it needs to. The extreme example of this is that you can run the entire database in one process, which is in fact what the installers do to provide a local FDB instance to play with.

So for your particular example, when a storage process gets recruited as a log and a storage server, it is running both concurrently. If your log process comes back, FDB will see that a more preferred configuration exists, and then trigger a recovery so that we can recruit a log on the newly available process and eventually abandon the old one.

(mehmet onsiper) #5

Thank you, this answers all my questions!