Just curious what the defaults are for the client timeout values and retry values, so that I can know whether I need to change those from their defaults for my use case. Also, a quick explanation of how those are used by the client could be helpful (i.e., does the client have a timeout set on each attempt, and then an overarching timeout value for all attempts in total? Or is the timeout just set for the overall set of attempts, and the client retries until either the max attempts are reached OR the timeout value is reached?)
The default behavior is to not timeout or limit retries at all. If there is blocking operation thatâs not being fulfilled or if the transaction is being retried repeatedly, it will continue to do so indefinitely.
To use timeouts and/or retry limits, you would set the appropriate transaction option during each attempt. The timeout is based on the start time of the first try. If either limit is reached, then your operations will start to fail with an error. In the case of timeouts, youâll get a timeout error, while if you hit the retry limit it will return to you the last error that caused the transaction to fail. If you donât set the option on a particular transaction attempt, then the limit wonât apply for that attempt, even if you set it during an earlier attempt.
The retry limit behavior is tracked in the onError
function. All of our high level retry loops are implemented using onError
, but if you are writing your own, youâll probably want to use onError
to determine if you should retry or not.
Just to continue on with what @ajbeamon was saying, hereâs the link for the (python) transaction options: https://apple.github.io/foundationdb/api-python.html#transaction-options And here they are for Java: https://apple.github.io/foundationdb/javadoc/com/apple/foundationdb/TransactionOptions.html
The relevant ones for timeouts and retry loops would be (and itâs possible Iâve missed one):
- set_retry_limit (set to -1 for â, 0 to only run the loop once, and k to run it up to k + 1 times, i.e., with k retries)
- set_max_retry_delay (to set how long you wait during exponential backoff)
- set_timeout (maximum amount of milliseconds to wait before automatically cancelling a transaction)
Unfortunately, these have to be set on a per-transaction basis, so there isnât a way to say, âAll transactions created with this database object should time out after 10 secondsâ unfortunately. If you had your own retry loops, you could add that information in when you created the transaction, though thereâs an argument that we should handle that better, too.
Oh, but also, if you are re-implementing retry loops, you might also want to look at the âisRetryableâ method on errors (here are the Javadocs. This will return âtrueâ if the error is something that could be temporary (like, say, network failures or transaction subsystem reboots) and âfalseâ if the error is something that retrying wonât help with (like, say, key too large). You should only retry if that function returns true. The onError
function handles that for you, in that it will error immediately if the error is not retryable and inject a delay before returning without error if it is, but you can also call it directly if you have some other system you are using to handle retry loops.
Just to be the one who disagree
I always thought that expecting (junior) developers to never forget to set these options on each and every transaction (including after reset), was too optimistic, so I decided to have the notion of default timeout and retry limit baked in the binding itself, which will automatically set these options when creating a new transaction internally before exposing it to the application (and also when the transaction resets)
If your binding does not do that, I suggest that you create an helper function that wraps the existing retry loops to automatically set these options, and then call the original lambda function or handler.
After a lot of auditing âreal lifeâ code, this was a good choice, because nobody ever bothered setting a value. At least they had a default timeout (after 60 sec) instead of the equivalent of a while(true) { burn_some_cpu(); }
.
I think there is a strong case to be made for having the ability to specify these options (and others) more globally. The idea weâve tossed around for this, which Alec alluded to, is to support setting transaction options at the database level that apply to every transaction created from that database object. However, this feature does not currently exist.
That said, Iâm not sure I agree that the best general choice is for the bindings (or c client) to prescribe a particular non-infinite default timeout. Timeouts and retry limits are the kind of thing thatâs likely not to be exercised in testing unless specifically thought about, and having operations unexpectedly fail in certain real world conditions doesnât seem desirable. If you havenât written your code to expect possible timeouts (which is a reasonable choice for someone who doesnât need them), then youâre likely to be surprised by this.
On the other hand, it doesnât seem onerous to me for someone that needs to deal with timeouts to also make a single call to set a global value. Then, they can set a value that makes sense for their use-case rather than a default thatâs likely not actually what they want anyway. I do agree, though, that shifting the burden to setting this option on every transaction is a pretty high expectation, and it would be good to fix this.
In Java, itâs not that hard to extend the class and automatically inject logging, setting of transaction options, etc. to all calls. We happen to also require all transactions to have a name so that we can track tps, time taken, etc. as metrics. Itâs also helpful to automatically print retries when it happens.
How do you collect these? I donât think the native client supports naming transactions?
This looks interesting, and I should try to investigate how to support this also. Iâm currently only tagging transactions with a client-side ID, but a app-provided tag could be very useful in some casesâŚ
The native client doesnât currently offer this, though support for it is something that weâre thinking about. Iâll try to get an issue created on GitHub to track it.
No, itâs not passed to the native client, itâs just something we shim around the java layer and we use dropwizard metrics for them. We also reference count objects with dispose() and call them proactively (but thatâs a different thing =p).
GitHub issue link: https://github.com/apple/foundationdb/issues/465
I agree with both ajbeamon and Christophe. Each developer shouldnât have to set a timeout as junior developers may not know these exist and they will never be set.
It should be easy in the bindings to set a per transaction override but it seems counterintuitive that the client is where youâd define the global timeout.
My strong preference would be able to configure a cluster wide timeout which is set as a best practice by ops based on data rather than a âbest guessâ from development.
This really is a server feature not a client feature if you think about it in the true sense. The server says: âyou should expect a response back in X time, or consider it a failureâ rather than each client saying âIâm gonna assume things failed if I donât hear back from you in X timeâ. In a well configured system, you will likely have both because you want more guarantees but in most cases, the server advertising a timeout would be sufficient as long as the client respects it.
Hi,
Iâm using FDB in my java maven project and have problem with running batch query. It failed with âpast_versionâ error. I set timeout option to 60000 and retry limit to 100 every time before running transaction. When I run transaction first time after running my application, it is interrupted after about 50 seconds, but next times it interrupted after 5 second and after retrying to run transaction 10 times it throws âpast_versionâ error.
Could you please help me to solve this problem?
My system is macOS Mojave 10.14.3
java version - 11
FoundationDB version 3.0.9
That is unfortunately unrelated to a transaction timeout. FDB has a 5s transaction limit. See the known limitations. This means that youâll need to break your scan of data down into multiple transactions, with each next transaction starting the read at the last key that your previous transaction read. This means youâll no longer have transactional reads though. If you need to do a transactional reads, then the easiest way is to suffix your keys with a versionstamp, and filter out newer versions manually.
Thank you for your quick response.
I read about limitation, but the fact that first time after starting my application the transaction runs about 50 seconds brought me to the idea that maybe it is possible to customize transaction limit. Have you any idea, why first time the limitation does not work?
Thank in advance.
There is no reason within FoundationDB that a transaction could last 50s. The only ordinary way this could appear is if it reliably takes you 45s to be able to connect to the database. However, it seems awfully suspicious to me that you set a transaction to retry 10 times, and youâre seeing a 10 x 5second delayâŚ
Iâm using batch transaction for creating multiple records and first time all records within 50 seconds are created without any retry, only next transactions are retried 10 time, each by 5 seconds. Anyway, thank you, I will try to solve the problem based on your proposes.
It seems that different database driver has different default timeout configurations. For example, go-redis DialTimeout=5s, ReadTimeout = WriteTimeout = 3s,
whereas mongodb go driver has: ConnectTimeout=30s, SocketTimeout=0 (can block indefinitely),
As far as Iâm concerned, that read/write timeout is defaulted to 0, meaning the operation may be blocked indefinitely, is a reasonable configuration.