High latency of read-only transactions at the primary DC?

We have three data centers that are physically separated from each other (denoted by L, S, and R). The round-trip network latency between the two datacenters, such as L -> S, or R -> S, is about 10 ms.

We set up two FDB deployments (with FDB 6.2.15), with each deployment following 3 DCs - 2 regions deployment scheme defined in the FoundationDB architecture (link: https://apple.github.io/foundationdb/configuration.html#asymmetric-configurations). In this deployment, DC1 is the primary DC, DC3 is the standby DC, and DC2 holds the transaction logs from DC1.

  • Deployment 1: Region 1 contains DC1 and DC2. DC1 is in datacenter L and DC2 is in datacenter S. Region 2 contains DC3 hosted in datacenter R. In this setup, DC1 and DC2 are at different datacenters that are geographically separated.
  • Deployment 2: Region 1 contains DC1 in datacenter L and DC2 is also in datacenter L, but in different availability zones. The latency between two availability zones in the same datacenter is around 0.5 ms. Region 2 contains DC3 hosted in datacenter S. In this setup, DC1 and DC2 are located very close within the same datacenter and thus the latency (about 0.5 ms) is much smaller compared to the cross-datacenter latency of 10 ms.

With each deployment, we set up two clients at DC1 and DC3, each client executing transactions to the deployed FDB cluster.

We experienced the latency (in milliseconds) reported by the clients (at both DCs) for read-only transactions as follows:

Deployment 1:

Configure DC1 DC3
50th 95th 99th 50th 95th 99th
Configuration 1: Primary DC is DC1 (the original configuration) 19 25 25 4 5 9
Configuration 2: Primary DC is DC3 (after we perform DC switching to have DC3 to become the primary DC) 4 8 10 5 10 10

Deployment 2:

Configure DC1 DC3
50th 95th 99th 50th 95th 99th
Configuration 1: Primary DC is DC1 (the original configuration) 5 10 10 6 10 10
Configuration 2: Primary DC is DC3 (after we perform DC switching to have DC3 to become the primary DC) 4 7 14 8 10 30

We observed that the latency of {Configuration 1, Deployment 1} experienced by the client located at the primary DC is much higher than the latency measured by the client at the primary DC in {Configuration 1, Deployment 2}. That is, (19,25,25 vs. 5,10,10 for 50th, 95th, and 99th latencies).

Is this latency difference of 15 ms because DC2 is not in the same datacenter as DC1 in Deployment 1? But our understanding is that DC2 only stores transaction logs, and thus should only impact writes latency. But what we observed is that it also impacts reads latency also. Is it true that this is a normal behavior?

We do have some read-only optimization implemented, as reported at FoundationDB Summit last year (the link: https://static.sched.com/hosted_files/foundationdbsummit2019/52/NuGraph.Built.Upon.JanusGraph.FoundationDB.Version11.pptx). As a result, at the standby DC in all of the four configurations above, the latency in our experiments is alway low because we cached the Global Read Versions (GRV) at the FDB client library and use them for initializing transactions (so the latency reported at standby DC is as we expected). However, we do not cache the transaction versions at the primary DC at this time.

For further details, this is the configuration when DC1 is the primary DC:

{
  "regions": [
    {
      "datacenters": [
        {
          "id": "dc1",
          "priority": 2
        },
        {
          "id": "dc2",
          "priority": 0,
          "satellite": 1
        }
      ],
      "satellite_redundancy_mode": "one_satellite_double",
      "satellite_logs": 10
    },
    {
      "datacenters": [
        {
          "id": "dc3",
          "priority": 1
        }
      ]
    }
  ]
}

and when DC3 is the primary DC:

{
  "regions": [
    {
      "datacenters": [
        {
          "id": "dc1",
          "priority": 1
        },
        {
          "id": "dc2",
          "priority": 0,
          "satellite": 1
        }
      ],
      "satellite_redundancy_mode": "one_satellite_double",
      "satellite_logs": 10
    },
    {
      "datacenters": [
        {
          "id": "dc3",
          "priority": 2
        }
      ]
    }
  ]
}

In summary, the questions that we like to ask:

  1. Does DC2, which hosts transaction log servers, play the role on the read-path?
  2. Are there some configurations in FDB 6.2 that we need to be aware that lead to possible high latency that we observed?
  3. In general, what is a good way to troubleshoot latency issues in FDB, to know where the transaction spent the most time on? It would be good if you can point us to some reference documentation.

Yes. GetReadVersion must verify that satellite transaction logs have not been locked, so they’re still on the path for starting a (read-only) transaction. (This is to guard against the database having already recovered into the secondary, which would have required locking transaction logs in the satellite.)

The further you put your satellite logs from your primary in a multi-region config, the higher the latency of a GetReadVersion call will be. Depending on how much you trust your clocks or how much (rare) stale data would affect your application, you could evaluate setting CAUSAL_READ_RISKY to drop the “have my transaction logs been locked” check.

There are debug transactions, which cause CommitDebug trace events to be logged in the various server components when that transaction is being handled. I wrote a hacky chrome://tracing based visualization for those logs as well. There’s a transaction every 10s that already has this set, so for most casual analysis, you probably don’t need to change your client code.

1 Like

Thank you very much @alexmiller for the clarification.
We then should cache the global read version at DC1 to reduce read latency in favor of performance over consistency.

I will also try out the tool as well to understand more about FDB transactions. Thanks for the tool!

If you’re already okay with stale data even in the primary, I’d strongly suggest trying CAUSAL_READ_RISKY first. You can even default it to on for all transactions on the server via --always_causal_read_risky=1.