How to Find Out the FDB Client Talking to a Remote Active Region for GetReadVersion in a Two-Region Configuration?

jltz · April 11, 2019, 9:50am

We are considering the two-region multi-datacenter FDB cluster configuration, described in the architecture document, https://apple.github.io/foundationdb/configuration.html#configuring-regions.

Assume that we have three DCs, A, B, C, in two Regions West Coast and East Coast. West Coast Region contains A and B and East Cost contains C. We need to have the Client application to be deployed in both West Coast Region and East Coast Region. Supposed West Coast Region is the active region. The Client hosted in the West Coast region will have reads to be handled by the proxy servers and storage servers in the West Coast Region. My concern is on the Clients that get hosted at the East Coast Region. According to the architecture document, “Reads can be served from either region, and clients can get data from whichever region is closer. Getting a read version from the each coast region will still require communicating with a west coast datacenter. Clients can cache read versions if they can tolerate reading stale data to avoid waiting on read versions.”.

Each Read transaction requires the Read Version. The Client hosted at the East Coast Region will experience the high latency due to cross-region network latency, unless a transaction has many reads bundled, and the latency cost of getting read version can be amortized. We would have to live with this constraint.

In order to have the Client Application deployed in the two regions to provide similar latency-related QOS, it is OK in our application that we can read the relatively stale data, say 1 second behind what gets updated. Following the statement of “Clients can cache read versions if they can tolerate reading stale data”, we would need the following logic in the Client’s program: if the current active region is a remote region, then re-use the read version from the most recent transaction (but no more than 5 seconds ago).

My questions are the following:
(1) what is the FDB client-side API support that allows us to check whether the Client’s active region is a remote region, so that we can turn on the read version caching logic in the Client program?
(2) Due to the 5-second transaction time limit, the Client program can not re-use the read version forever. Every time the program gets to the active region (west coast region), we will see the 95%-percentile or 99%-percentile latency increased. So to maintain the QOS, we would need to have a background thread to keep making the GetReadVersion call to the active region, so that the Client can maintain a recent Read Version number. Is this a good way to reduce the 95% or 99%-tile latency?

alexmiller · April 11, 2019, 9:46pm

There’s two potential answers to this, but as neither of them have actually been implemented yet, the real answer for now is that you’d need to do the read version caching manually yourself.

The future improvements will be:

Add native support for caching read versions, which will then give you an easy way into doing the right thing with read version caching The Right Way™.
Continue providing read versions during recovery, which although the issue it doesn’t sound like it, is ~98% of the work to allowing stale read versions to be locally acquired in remote datacenter.

Correct. Refreshing your read version about once a second would seem pretty reasonable to me, but overall, the background thread that refreshes the read version that all your transactions use is the right approach.

jltz · April 12, 2019, 5:18pm

Thanks Alex for the response. With your confirmation on the design, I am looking at the current FDB Java-Binding APs to support what I would need.

(1) FDBTransaction already has the method called: getReadVersion and the generic Transaction interface already has setReadVersion. So that will allow the background thread to create a dummy transaction to fetch most recent read version in the transaction manager. I do not need to wait for the native support to be available on caching read version.

(2) For the Client Application that is deployed in the current Active Region, I would like to not use the cached read version. Instead, the transaction will issue getReadVersion directly, Since it is local to the active region, the overhead of this getReadVersion can be ignored for now. Only for the Client Application that is deployed in the current stand-by region, the application will need to turn on the read version caching. So the Client Application would need to have a mechanism to call the API: (1) what is the region id of the active region; (2) and then compare this retrieved region id with its own assigned region id when the Client Application gets started.

In the current FDB Java Binding, DatabaseOptions.java provides setDatacenterID(.), but no setRegionID(.). And I can not find a Java class that provides getRegionID(.) and getActiveRegionID(.)

So the implementation-related questions is: If I want to implement getActiveRegionID(.), do I have to go to the C++ Library binding?

alexmiller · April 12, 2019, 8:08pm

DatacenterID is what you want. A “region” is basically a datacenter with an optional one-or-more satellites. You would be recommended to set clients in a datacenter to have the same datacenter ID as the storage servers in that datacenter.

It looks like you can set ACCESS_SYSTEM_KEYS and read \xff/primaryDatacenter from a client to get the dcId of the current primary, but we should probably make a proper API for that. (I’m not currently sitting somewhere that I have an easily accessible multi-region cluster to test that on, so I’ll confirm once I do.)

ajbeamon · April 12, 2019, 8:28pm

I’d recommend using READ_SYSTEM_KEYS if you don’t need write access.

jltz · April 13, 2019, 1:52am

We are deploying a three-datacenter two-region FDB cluster. I will try out the retrieval of the system keys and let you know.

jltz · June 14, 2019, 9:04pm

Here is the steps that I figured out to retrieve the Primary DC in fdbcli. We also implemented the same steps with FDB Java Binding as well.

//turn on system key access option
fdb> option on ACCESS_SYSTEM_KEYS

//to retrieve the special key
fdb> get \xff/primaryDatacenter

//then turn off the system key again
fdb> option off ACCESS_SYSTEM_KEYS

The second step, get key, will return in the following query result, for example:

\xff\primaryDataCenter is dc3