Set primary data center does not work as expected

ntrhieu89 · June 14, 2019, 8:35am

I have two setups of FDB clusters: SingleDC and MultiDCs (2 regions with 1 primary DC, 1 standby DC, 1 satellite DC).

I have my client issuing requests (read-only) to each FDB cluster. With MultiDCs, my client is located at the primary DC. However, the latency of the MultiDC set-up is higher than the SingleDC. In my client, I already set the data center id to be the primary DC.

If I understand correctly, the latency with the MultiDCs should be comparable with SingleDC with my setup, since it is read-only and with MultiDC, the client should always go to the primary DC to get the read version and perform the read. Am I missing anything? My FDB version is 6.0.15.

Updated:

If I put my client at the primary DC, the latency is ~2 times higher (15ms vs 7.5ms).
If I put my client at the standby DC, the latency is ~5 times higher (35ms – add 20ms more).
(Note that this is read transactions that may read multiple key-value pairs, not reading a single key-value pair).

alexmiller · June 15, 2019, 12:04am

If you ssh to a host in your primary DC, what’s the round-trip time to your satellite DC, and what’s the round-trip time to the remote DC? (ie. ping them, and what’s the latency?)

ntrhieu89 · June 15, 2019, 12:15am

@alexmiller:
Roundtrip time from one node in primary DC to another node in remote DC is ~ 7 to 8ms.
Roundtrip time from one node in primary DC to another node in satellite DC is ~ 7 to 8ms.

If the transaction is read-only (and is executed at the primary DC), how frequent does it go the other DCs or usually how many roundtrips it makes to the other DCs? As I understand, it should not go either satellite DC or remote DC, should it?

ntrhieu89 · June 17, 2019, 8:27am

When I turned on the client trace, the event of type “TransactionTrace_GetVersion” always takes 10ms although I configured the client to be the same DC as the primary DC and set the db.options().setDatacenterId() to the primary DC id.

When my client is at the remote DC, the latency of GetVersion is 20ms to 30ms.

When I check the experiment with single DC, this number is around 1ms.

Other events the latency looks normal to me. Does that mean my configuration is wrong somehow that causes my GetVersion latency to be so high?

ajbeamon · June 17, 2019, 4:14pm

Both commit and get read version requests require a round trip to the transaction logs from the proxies. If you have a setup with logs in a remote DC (like the satellites), then both of those requests will be affected accordingly.

If I’m interpreting your setup correctly that you have logs in a satellite with a round-trip time of 8ms, then the numbers you report sound reasonable to me.

Read-only transactions will need to get a read version for the first read, so they will also experience a round trip latency to the satellite DC in that case. There are various strategies that could be used to limit that effect, such as operating a satellite that is closer to your primary, reducing causal consistency guarantees by using the CAUSAL_READ_RISKY transaction option (doing so avoids the round-trip to the logs, but introduces a rare possibility that you might get a read version older than a committed version), or caching read versions and reusing them (which means you could be reading slightly stale data).

alexmiller · June 17, 2019, 8:43pm

I was thinking about this, and thought that the GRV quorum work I did some time ago meant that this shouldn’t be required. If we get enough GRV responses back from the local DC to indicate that local TLogs could still commit, I’d… think that should be good enough. I’ve thrown my claims into correctness, and we’ll see if that turns into a PR.

ajbeamon · June 17, 2019, 8:52pm

Interesting, I wasn’t aware of that. Based on the reported numbers, it seems likely that it’s having to do a round-trip outside of the primary. Could there be something about their config that would make it not benefit from your change?

alexmiller · June 17, 2019, 10:09pm

I retract my claim, because tolerating the loss of 1DC means we need to be able to lock 2 out of 3 DCs during recovery, which also means our GRV quorum needs to be 2 out of 3 DCs. So, GRVs do indeed need to go to satellites, and minimal quorums only work within one DC.

ntrhieu89 · June 18, 2019, 4:27am

I tried to bring the satellite DC close to the primary DC and the latency reduces to be close to the single DC setup (actually, they are deployed on the same data center) . Unfortunately, with my current change, my setup may not tolerate data center failure.

I don’t quite understand what you are discussing, probably because of lack of knowledge of how FDB works internally. For example, other than get read version and commit, is there other transaction operations that go cross DCs? A read or get-range always get the data in its DC, or sometimes it has to go to the other DC to get the data? A write operation always go to the transaction logs processes of the primary DC and satellite DC synchronously?

Is there a document describing how a transaction is executed with a 3-DC, 2-region setup? That would be really helpful for me to get some insights of my deployment.

ntrhieu89 · June 18, 2019, 4:59pm

If I put my client at the remote DC, the latency is still higher than with the primary DC (15ms vs 8ms). I tried both methods: enable causal read risky and use a background thread to pre-fetch the read version does not help. Could it be because the tx.commit().get() or tx.cancel() still goes to the primary DC and satellite DC? If my transaction is read-only, could it skip tx.commit().get() and tx.cancel()?

alexmiller · June 18, 2019, 5:33pm

During recovery, there’s a Lock RPC that we make to TLogs to tell them that there’s now a newer generation of TLogs and that they should stop accepting new commits. Getting a read version checks to make sure that the TLogs aren’t locked, and therefore we can decrease latency a bit by using a minimal quorum for this. As recoveries are far less frequent than getting a read version, we force recovery to lock a large number of TLogs and GRV to check that a minimal number aren’t locked. I was thinking that this quorum check for GRVs should have been fine with a quorum of responses from one DC, but then later reflected, and realized that it does need to wait for responses from two DCs. So no cheap easy optimization for me.

Correct. Clients in the secondary DC are expected to have higher latency, as starting a transaction and committing a transaction still require going to the primary datacenter. The other operations that a transaction would do are reads, and those should be able to be served from the local DC, as long as there are non-failed storage servers available locally.

If your transaction is read-only, tx.commit() is already a no-op, won’t actually send any packets, and should return a future that’s immediately ready.

ntrhieu89 · June 18, 2019, 6:44pm

Based on your answer, with a read-only transaction from the remote DC, committing a transaction does not go to the primary DC anymore.

How about starting a transaction? Currently, I initialize a transaction by using Transaction tx = db.createTransaction();, so I guess FDB does not know up-front whether this is a read-only or read-write transaction and then still spends one roundtrip to the primary DC. Is it correct?

If I know the transaction is read-only, is there a way to hint the transaction at the beginning so that starting a transaction only access the local DC?

If I use the read transaction from tx.snapshot(), will it solve the issue?

ryanworl · June 18, 2019, 7:02pm

A transaction needs some read version to perform reads. Your options are to let the client get it automatically (incurring the round-trip to the primary DC), or some flavor of re-using an older read version outlined below, which will read potentially stale data.

snapshot is a different concern. Snapshot does not record read conflict ranges for the reads you perform. You still need some read version to perform those reads at.

alexmiller · June 18, 2019, 10:42pm

There will eventually be improvements to doing this, as well. Being able to get a stale read version from a non-primary DC locally is #1006. Better read version caching is #1310.

Topic		Replies	Views
High latency of read-only transactions at the primary DC? Using FoundationDB	5	1115	December 14, 2020
Cross Datacenter/Region? Using FoundationDB	9	3410	June 14, 2018
How to Find Out the FDB Client Talking to a Remote Active Region for GetReadVersion in a Two-Region Configuration? Using FoundationDB bindings	6	1272	June 14, 2019
Maintain Read Availability in Partitioned Data Centers FoundationDB Core	2	1093	April 21, 2018
Multi DC cluster latency requirments Using FoundationDB	0	508	September 27, 2018

Set primary data center does not work as expected

Related topics