SS / tLog kick-out and rejoin

(gaurav) #1

Hi, I have a few questions on the handling of processes being unreachable from the cluster:

  • What is the protocol/semantics for handling a storage server or a tLog that is unreachable for an extended duration (say minutes or hours to days), and eventually joining back the cluster?
  • How soon will Master/DataDistributor/CC consider the unreachable SS as “dead” and, start extra replication of data after an SS is unreachable?
  • If an unreachable SS process is not specifically excluded via fdbcli, is there any cost paid by rest of the cluster (something like tLogs not popping out mutation, waiting for unreachable SS to come back)?
  • Once the SS comes back alive and tries to join back the cluster (on its own), will the Master/CC recruit it as a fresh SS process and clear its stored disk data, and then give it shards to replicate from other SS nodes?

It would be great to get some more idea about how this aspect is handled by fdb.

(A.J. Beamon) #2

I don’t know the exact numbers, but it’s something on the order of seconds.

If the process is dead, I don’t think there is much difference between excluding it or not. If the process comes back before it’s fully re-replicated, then the behavior would be different, as the excluded process will continue excluding and the other would rejoin as described below.

If all data has been re-replicated, then the data files will be deleted and it will start fresh. Otherwise, the storage server will retain its data files and maintain responsibility for any data that hadn’t yet been re-replicated. It’s possible that this is only a small portion of the original data assigned to that process, in which case much of the file will be unused and have to be reclaimed for reuse.

Additionally, the cluster will attempt to move new data onto this process to replace data that had been moved away in order to keep the logical amount of data stored on each process balanced.

I think this was mostly answered above for the storage server. When this happens to a log or any stateless process, the cluster will perform a recovery and recruit a new transaction subsystem to replace the old one. If a log rejoins before the data in its files is made unnecessary, then it may also participate in the recovery, though I’m not certain on all the details of this. If it rejoins after the recovery from old logs is completed, though, then it too would be able to delete its files.