Two region setup: fdb doesn't switch back to recovered primary idc

We have below setup for cross-region to verify the behavior of FDB’s capability to handle idc failures.

{"regions":[{
    "datacenters":[{
        "id":"idc1",
        "priority":1
    },{
        "id":"idc2",
        "priority":1,
        "satellite":1,
        "satellite_logs":2
    },{
        "id":"idc3",
        "priority":0,
        "satellite":1,
        "satellite_logs":2
    }],
    "satellite_redundancy_mode":"one_satellite_double"
},{
    "datacenters":[{
        "id":"idc3",
        "priority":0
    }]
}]}

And we have read write clients connecting to the cluster.
If we shutdown the processes of fdb in idc1 by kill -9 processIds, we observed the read-write traffic goes to idc3, which was expected.

And then we restarted the processes in idc1, which simulating the recovery of idc1.
We see some text in status details output like below:

10.218.74.129:5018 ( 75% cpu; 90% machine; 0.059 Gbps; 93% disk IO; 2.1 GB / 8.0 GB RAM ) Storage server lagging by 126 seconds.

Seems idc1 is trying to catch up idc3’s new data.

But this time, the read-write traffic was still hitting idc3, the expected auto switch back as document configuration described didn’t happen. https://apple.github.io/foundationdb/configuration.html#asymmetric-configurations.

Anyone has a hint what is missing here .

BTW: If we configure the dc3 with a -1 priority, the client traffic switched back to idc1.

{"regions":[{
    "datacenters":[{
        "id":"idc1",
        "priority":1
    },{
        "id":"idc2",
        "priority":1,
        "satellite":1,
        "satellite_logs":2
    },{
        "id":"idc3",
        "priority":0,
        "satellite":1,
        "satellite_logs":2
    }],
    "satellite_redundancy_mode":"one_satellite_double"
},{
    "datacenters":[{
        "id":"idc3",
        "priority":-1
    }]
}]}
1 Like

idc1 would need to catch up to idc3 for FDB to recover back to idc1. You can check status json (.cluster.datacenter_lag) to see how far behind it is, and how well it is doing in catching up.

If FDB tries to recover back to idc1 as soon as the processes return, then there’d be unavailability as you’d have to wait for idc1 to replay all mutations that were missed during that time.

It looks like the docs are also slightly out of sync there. From the 6.2 release notes:

  • Remove cluster.datacenter_version_difference and replace it with cluster.datacenter_lag that has subfields versions and seconds. (PR #1800) <https://github.com/apple/foundationdb/pull/1800>_.

OK. I will retest and monitor the lag.

After the primary was recovered, even I stop the read-write traffic for whole night, the output of status details was showing the primary region was still at the remote region. Auto switch back to primary did not happen.

I observed the lag after primary recovered to be negative.

"datacenter_lag" : {
            "seconds" : -0.023191999999999997,
            "versions" : -23192

Two questions:

  1. The auto switching back to primary doesn’t happen, why?
  2. Why the lag values are negative? What do they mean?

Someone posted this two-region how to set the primary region · Issue #6190 · apple/foundationdb · GitHub
Seems this is a bug.