I’m using DR to do asynchronous replication from one primary region to multiple read-only regions, and using READ_LOCK_AWARE to read from replicas. I’m wondering about the atomicity guarantees of this approach:
When the DR replica lags behind too much and decides to pull the full snapshot, would the client see inconsistent data?
When the DR is in differential state, are logs from each transaction applied atomically?
It’s certainly true that transactions from the logs are applied atomically. Once the ranges have been restored from the primary, the DR cluster is guaranteed to see a prefix of the commit log and will never see only parts of a transaction.
It’s been a while since I’ve looked into this, but I believe the DR side won’t elect to re-pull a full snapshot, even if it’s very behind, unless the operator asks using one of the fdbdr commands.
I think, maybe, embedded in your question is “how do I know if the DR side has fully restored the range so that I’m guaranteed to see an internally consistent (possibly stale) view of the database”? This part, I’m less sure. I think there’s a field in the status (under dr_backup, I think), but I admit I’m not sure.
I think, in general, our recommendation would be to use a multi-region cluster instead of using async DR, as that solution has better semantics. In particular, once you’ve begun a transaction, you’re guaranteed to see a consistent view of the database, even from the remote side. In particular, if the remote side is missing a range of data, it can go to the primary and get that data, and it can do things like rehydrate the remote side from the primary (and the primary from the remote) if all copies of some range of data are lost from one side. When a transaction begins, it does need to go to the primary side to get a “read version”. If you need to avoid that hop but are okay with stale reads, our suggestion is to cache read versions locally, and then call setReadVersion on transactions with a previously acquired version.