When we run the
fdbdr status command on a DR that has not reached the differential stage, it will report that the DR is not complete, but also give a number of seconds behind. It’s not clear to me what this seconds behind means, and it does not seem to provide an estimate of how long it will take to be caught up. Is this something we should remove from the status, or does it provide useful information?
When we run the
Yeah that definitely is confusing.
Seconds behind always refers to only the mutation log, and it is exactly the (latest committed version on primary - latest mutation version committed to secondary such that all prior mutations are also committed) / versions per second.
Because versions on the primary can be advanced faster than versions per second due to recoveries, seconds behind is at best an approximation of the mutation log delta between the two clusters. And it doesn’t tell you how much data is in that interval or how fast it is being written to the remote side. These things could be estimated as well.
That said, in the situation where the DR has not reached the differential stage even having a mutation log lag of 0 still means the secondary is not usable for failover as it is missing ranges of data. I think it is still useful to report seconds behind before the differential stage but we should perhaps rename it or somehow make it clear that it only refers to the mutation log.
Status could potentially publish an estimate for the completion of the initial keyspace copy using the size of the primary cluster, the range bytes copied so far in the DR, and how long copying those bytes took. Or maybe something involving recent committed throughput to the secondary.