Txn too old exception when performing reads

Hi,

While performing a read query on range (~12k records), I came across the “Transaction is too old to perform reads or be committed”. I have already set a cap in the time limit at 4000ms (as a workaround for the 5s rule) in the Execution Properties which should essentially terminate the transaction and return back the continuation.
I wanted to understand under what possible scenarios (while performing reads) would I still get this retry-able exception? How can I mitigate this?

Thanks.

1 Like

tagging @alloc in case he has an answer here.

What was the read version used in the transaction? Is it possible that the read version was set explicitly by the client to some older version that became ‘too_old’ before 5s limit?

Hi Gaurav,

No read version was set explicitly by the client. A little more background: We have a single append only writer job that is writing the records as well as the indexes in the db and a single client read operation. The cluster is running on a triple redundancy mode, ssd, 16TB (Sum of K-V sizes), 65TB Disk space.

The cut off time limit of 4s is also set on the query execution.

Interesting. I would expect that setting a 4 second limit would protect against most “transaction too old” errors, but note that the 5 second limit is only “roughly” 5 seconds. In particular, the reason there is a 5 second limit on reads is that each read is conducted at a specific “database version” (essentially, commit timestamp) so your reads won’t see things added after your transaction begins, and each storage server only stores 5 seconds worth of history. You then get the “transaction too old” error if the storage server no longer has historical information from when you began the transaction. I think it can decide to forget about older versions “faster” than is strictly necessary if it decides that the cluster is too busy (but I could be wrong).

So, I guess I don’t know for certain. I suppose it’s possible that the storage servers are too busy and are keeping less than 5 seconds worth of history, or that tail latencies in the “last read” might result in the final read taking more than a second, or that activity between when the transaction starts and when the query starts might eat into the 1 second of buffer you’ve got, or a combination of all three or something else.

In terms of amelioration, I guess you could try 3 seconds instead of 4 seconds as the time limit. The other thing that I’ve considered is that perhaps there should be an ExecuteProperties option to ignore transaction_too_old errors (and perhaps other retriable errors) and have the cursor return a continuation with a “no next reason” of something like “hit an error”. You can kind of simulate that today if you keep track of continuations “as you go”, but it’s a bit of a pain.

Does this timeout being set in the record layer cause a timeout to be set on the FDB transaction?

No, those are separate timeouts.

The Record Layer timeout here (in the ExecuteProperties) just stops a specific query from executing after a set time. (It can also be configured to stop a query after a set number of keys read or a set number of bytes read.) After the query completes, you get back an object that says where it was when it stopped and another enum that says that it stopped when it ran out of time.

There are ways of setting the FDB client transaction timeout through the Record Layer, but that’s a different set of APIs.

The reasoning behind setting the time limit at 4s was to provide the 1s of buffer for the combination of possibilities you mentioned. I have tried reducing this further to 3s and still get the exception. Reducing the time limit any further would result in an increase in the number of db calls and much slower read performance. I’m uncertain as to why this is happening. The query is pretty heavy and scans ~12k records or more. Is there any other workaround (other than the painful one) /other possibilities I should explore?

Thanks

Interesting that it happens after 3 seconds as well…

Do you have instrumention on the query? For example, can you tell if it “really” is taking 5 seconds (and is ignoring the limit somehow), or if this is a sign that the cluster is somehow returning the error earlier than normal? Perhaps more to the point, if this is a cluster performance problem that requires investigation, this might not be something that’s easy to fix in this forum without (perhaps a lot) more information.

The other thing I sort of mentioned is that you can keep track of the continuation “so far” and use it instead of the continuation you get at the end of the query. Something like:

RecordCursorResult<?> result = null;
try {
  do {
    result = cursor.getNext();
    if (result.hasNext()) {
        processResult(result.get()); // up to you what to do here. Perhaps put in a list to return?
    }
  } while (result.hasNext());
  return result;
} catch (FDBStoreTransactionIsTooOldException err) { // maybe FDBStoreRetriableException instead
  // maybe log that there was an error?
  return result;
}

(I wrote this using the blocking APIs, but I think there are non-blocking ones. I also haven’t compiled it, so you know, there may be syntax errors.) The idea being that if you hit the error, you still “make progress” as much possible. This doesn’t address the underlying issue if there is one, but it perhaps allows you to avoid its problems. This is also (more or less) the logic I’d alluded to maybe the Record Layer handling instead (by having it catch retriable errors for you).

Another tactic to avoid this error might be to look at the query. Perhaps you’ve already done this, but if it’s a heavy duty, there may be ways of optimizing it (perhaps by adding indexes or something) so make it less heavy weight. This is particularly true if you are reading and discarding a lot of records (which you didn’t say you were, but if you are, there may be ways to help).

One further note to be aware of. With the scan limit enforcement logic (limiting on time, scanned records, scanned bytes), record layer has the behavior that any scan that is initiated must make at least some progress–that is, it must read at least one record, regardless of any scan limits specified. You can see this in action by setting the scan limits impossibly low; you will still get at least one progressing read.

Due to this behavior, there are times in which you can get a transaction_too_old despite your request to have the scan stop automatically, before FDB should have given that error. You could demonstrate this by opening your transaction, sleeping for six seconds, then doing the read.

In your example, there may be a legitimate problem that needs to be investigated, however a good defensive approach is to include logic like @alloc suggested above as well.