FoundationDB

Searching for more information


(Marc-André Tremblay) #1

Hey,

I would really like to operate a FDB cluster. In order to do so, I need to gain better understanding of process classes.

  • storage
  • transaction
    • log server
      • active
      • standby
  • stateless
    • proxy
    • resolver
    • cluster controller
    • master
    • standby

Searching the documentation for such terms does not yield relevant information. I’d like to know:

  • the properties and intricacies of each of those roles
  • how they map to the architecture diagram
  • how processes get selected to enact those

I already did RTM. Can someone point me in the right direction?

Thank you very much,
Marc


(Alec Grieser) #2

Well, here’s a first crack at it. I’m not super fluent in all of those, but here’s a rough overview.

Here goes:

  • storage - These map pretty well to the storage nodes within the diagram. They are the main storage space for data in FDB. All reads are served directly from the storage servers (possibly from in-memory caches maintained by the servers). They use the configured storage engine, so, for example, if you are using the SSD engine, then each node is in charge of managing one B-tree. It will fsync at a rate of about 1 fsync a second, I believe.
  • transaction - These are the transaction log servers. They are part of (but are not all of) the transaction authority. Every transaction must be written (and fsync’ed) to ${replication factor} transaction logs before being returned successfully. As a result, they are constantly doing I/O, so it is probably best to give each one its own disk (and possibly a higher quality disk than is used for storage servers). An active log is being used, while a standby log is idle, but it remains available in case a machine with a log dies and one needs to be replaced.
  • stateless - These, as the name suggests, do not require access to persistent volumes and don’t do any disk I/O. Most of these roles are part of the transaction subsystem, though not all of them. They do the following things:
    • standby - Currently idle, but ready in case something is needed.
    • proxy - All reads must talk to a proxy to get a read version at the beginning of a transaction, and commits go through the proxy. It will coordinate further communication between servers in the cluster.
    • master - Partially part of the transaction subsystem. It’s most important role within the commit pipeline is to hand out commit versions. Proxies ask the master for a version after receiving commit requests, which is how version monotonicity is maintained. Outside of the commit pipeline, it has some jobs balancing data across storage servers and monitoring usage to know if we need to apply back pressure to incoming transactions.
    • resolver - Part of the transaction subsystem. In particular, it is the process that is in charge of determining which transactions conflict with other ones. Proxies (after getting a commit version) send read and write conflict ranges for each transaction to the resolver, and any transaction that read data that was mutated between when the transaction is started and when it is committed is failed (which is how the ACI properties are guaranteed).
    • cluster controller - Not part of the transaction subsystem. It’s main job is to to do failure monitoring.

To select roles, the following is done:

  • The cluster controller is selected by the coordinators (who are not identified with a special class but are identified instead by the cluster file). There is a leader election process that chooses one process from those that join the cluster. It will favor stateless processes if one is available, but it will use a storage server or transaction log if need-be. Once it’s known, other processes can talk to the coordinators to figure out who the current cluster controller is.
  • The master is selected by the cluster controller. The other processes talk to the cluster controller (for failure monitoring purposes), and it’s from this list that the cluster controller picks the master.
  • The master selects the rest of the processes using the failure monitoring information kept by the cluster controller. When selecting roles, the master will prioritize recruiting servers in ways that are consistent with their declared process class, but it might ignore the process class if there aren’t enough alive. For example, if there are only two (live) processes in the cluster marked as transaction class, and the replication mode is “triple”, then the master might recruit a transaction log on a “storage” class server. But it generally avoids this. (For example, if you configure a cluster to use 7 logs in triple replication, and there are only 5 transaction class servers, then it will recruit those 5 only because it requires 3 for durability, and it’s considered worse to put a transaction log on a storage class process than run with two fewer logs than is configured.)

There are some details elided here, but that’s the basic idea. There is also work done to do re-recruitments (called recoveries) when processes die, but that uses the same basic process as is done to recruit the processes initially (with extra work to make sure that the data that were persisted to disk in the previous generation are available in subsequent generations).


(A.J. Beamon) #3

I think it’s also worth mentioning explicitly that classes are slightly different than roles. Specifically, a process’s class determines which roles it can take.

Storage class processes always run a storage role I believe, and could potentially run others too if there is no more suitable fit for them.

Transaction class process are where transaction logs prefer to go, but not every such process will run a log depending on configuration.

Stateless class processes are a preferred destination for the things Alec listed.

I think there are other classes which can target specific stateless roles as well.