What is the data re-balancing behavior when a node is temporarily not available?

(Jun Li) #1

I am doing some resiliency testing on the FDB cluster in the Kubernetes environment. A Kubernetes Pod is like a VM. A pod hosts 2 or 3 FDB server processes and a FDB monitor. When a Pod (VM) becomes unhealthy at t0, the FDB cluster will detect that the Pod (VM) is unhealthy at t1. And sometime later, t2, the FDB cluster will determine that it will need to perform data re-balancing, to distribute the data replica that is hosted on the unhealthy pod to the other servers, to make sure that the required number of the replica meets the FDB cluster configuration (for example, triple replica).

Under some circumstance, after t1, the Pod (VM) may become healthy again in a short period of time, for example, due to the recovery of a network glitch. If (t2-t1) is too short, it might trigger the unnecessary data rebalancing.

So my question is: how long is the setting for (t2-t1) in the current FDB? and can this parameter to be tuned?

The related question is: supposed between t1 and t3, incoming transactions involves the write to the replica (hosted on Pod-1). Since the pod (Pod-1) is not reachable, the data will be distributed to the other pod ( say, Pod-2). Now after the pod becomes healthy again, will the data that is already distributed to Pod-2 will have to be shipped back to Pod-1?

(Rishabh) #2

I had the same doubt. Looks like for storage nodes, Knob “DATA_DISTRIBUTION_FAILURE_REACTION_TIME” used in storageServerFailureTracker is the timeout for marking a pod healthy/unhealthy. I am not sure if its true for TLog only pod as well, whose failure will also trigger data rebalancing.

(Alex Miller) #3

As rishabh found, you can raise --knob_data_distribution_failure_reaction_time if you’re concerned about frequent storage server partitions that you’d prefer to wait out.

TLogs are detected off of WORKER_FAILURE_TIME, which applies to mostly everything else as well. TLog failures cause recoveries, but not data rebalancing. (And, in 6.1, a recovery doesn’t even cause data distribution to be re-initialized, as it was split off into a separate role from the master.)

2 Likes