`fdb_future_block_until_ready` and retry logic

rajivr · July 29, 2021, 4:11pm

I was wondering if fdb_future_block_until_ready implements the recommended retry logic internally?

I tried looking at the Go API, and it seems like fdb_future_block_until_ready might be implementing retry logic internally (as code path calls fdb_future_get_key immediately after BlockUntilReady).

In case of Java API, it looks likes it is not making use of Future_blockUntilReady.

Best,
Rajiv

andrew.noyes · July 29, 2021, 6:50pm

I’m not sure what exactly you mean by “recommended retry logic”. fdb_future_block_until_ready doesn’t implement any retry logic though. I think the recommended retry logic is to restart your transaction from the beginning if any operation in your transaction fails with a retriable error. The easiest way to do this is to wait on the future returned by fdb_transaction_on_error, which will become ready after an appropriate backoff or fail with an error if the error is not retriable. There is also fdb_error_predicate if you want to implement your own backoff strategy and just want to know if the error is retriable or not (this is a bit of an advanced, use-at-your-own-risk feature).

Regarding fdb_future_block_until_ready, there are basically two phases of interacting with fdb futures - wait until the future is ready, and then inspect the future to retrieve its value or error status. fdb_future_block_until_ready is just a mechanism for waiting until the future is ready, blocking the current thread until the future is ready to inspect. The other option is to use fdb_future_set_callback and register a callback that will be called some time after the future is ready - this just happens to be the method the java bindings use.

There’s a suite of functions for inspecting the results of futures, depending on the type of value the future holds. fdb_future_get_key is the mechanism for futures that hold keys. There’s more fdb_future_get_*() functions for other types, e.g. fdb_future_get_keyvalue_array for obtaining the result of get range operations. Keep in mind that the lifetime of the underlying memory used to store future results is owned by the future itself, so many bindings copy the results into a language-native representation before destroying the future.

Anyway this is all mostly only relevant for bindings authors or people using the c api directly. Let me know if any part of this is unclear - hope it helps.

rajivr · July 30, 2021, 8:01am

@andrew.noyes Firstly, thanks a lot for reply. I really appreciate it!

I think I now have slightly better understanding of the design of the Java and Go bindings.

At a high level it seems transaction API in Java and Go operates on a function-like object. FDB futures along with appropriate business logic is composed inside this function-like object.

If the evaluation of this function-like object fails with a retry-able error, then the function-like object gets re-evaluated using fdb_transaction_on_error. Otherwise either the success value or an error gets returned to the caller.

Would this be a fair characterization?

I am currently exploring how to effectively use FDB Rust bindings along with Tokio. Hence the questions relating to C APIs. Yes, your reply indeed helped a lot! Thanks again!

Best,
Rajiv

andrew.noyes · August 2, 2021, 6:06pm

Yes this sounds fair. The idiomatic way to execute a transaction in all bindings is to pass a callback (a callback that takes a transaction as an argument) to a function that handles tries - and you’ve linked that “retry loop” function for java and go.

I have a (hobby-only) interest in this so I’m happy to help answer questions etc in my hobby time

rajivr · August 6, 2021, 6:01am

@andrew.noyes I noticed there is a specific pattern in which interfaces ReadTransactionContext, ReadTransaction, TransactionContext, Transaction, and Database are organized in Java and Go bindings.

Did this organization evolve over time or was there some thought given to this design initially?

Also here is the design of the FdbFuture that I am currently prototyping, and its far from final!

andrew.noyes · August 6, 2021, 4:24pm

I’m not sure I’m the best person to answer that - I think this organization predates my time with foundationdb.

I think this is using some rust features I’m not very familiar with but from what I understand it looks like a reasonable way to model a blocking API for futures. A couple of thoughts:

There are other things you can do with futures that aren’t modeled yet, e.g. cancellation
I’m not sure exactly what check is doing, but since there are many foundationdb errors that are intended to be handled gracefully/retried, it probably shouldn’t panic. Presumably FdbResult is an enum that can encode an error?
Is the plan to copy the memory for types with memory owned by futures?

In case you didn’t see this yet there is also prior art in https://crates.io/crates/foundationdb. Is that not usable with Tokio?

rajivr · August 7, 2021, 3:22am

I’ve tried to model cancellation implicitly using Rust drop semantics, and used the fact that C API fdb_future_destroy, also does cancellation.

Introducing explicit cancellation would have meant that we would have to maintain cancellation state information in FdbFuture<T>, and it would have also increased the surface area of the API.

I was wondering if there was a use-case where we would need access to a FdbFuture<T>, that has been cancelled but not destroyed?

check does not panic. check takes a fdb_error_t and returns a FdbResult<T>. It is basically a type synonym.

/// Alias for [`Result`]`<T,`[`FdbError`]`>`
///
/// [`Result`]: std::result::Result
/// [`FdbError`]: crate::FdbError
pub type FdbResult<T> = Result<T, FdbError>;

check is similar to eval, in the current FoundationDB Rust crate. I used check instead because it is an idiom that is covered in “Foreign Functions” chapter in Programming Rust book.

Yes, in FdbFuture<T>, T must be an owned type (i.e, FdbFuture<T> owns the data that T might contain). In Rust, we can’t explicitly set a trait bound on T to say that it must be an owned type, but that is the idea.

The join method takes self (instead of the usual &self or &mut self), which transfers the ownership of FdbFuture<T> to the join method. Once join method completes, ownership of T gets transferred to the caller via FdbResult<T> and self: FdbFuture<T> gets dropped, thereby destroying the FDB future.

The plan is to implement the logic for copying in the FdbFutureGet::get trait implementations for the appropriate types.

Yes, I’ve looked into this crate and also the awesome work done by @PierreZ here in order to bring 6.3 support to this crate. This helped me a lot, in order to quickly come up to speed with my current effort.

As I studied the FoundationDB crate, I realized that there could be an impedance mismatch between how Rust Futures and FDB Futures work. This issue is explained in the section The problem: completion, cancellation and buffer management in this blog post.

The current FoundationDB crate is trying to adapt FDB Futures to Rust Future, but I am not sure if the semantics are compatible.

In the design that I am currently exploring, rather than trying to adapt FDB Futures to Rust Future, my plan is to use Tokio’s blocking threadpool to manage FDB Futures. From what I understand, under the hood, blocking thread pool is also used by JVM and Go Runtimes.

The other design goal is that I want the Rust binding APIs to preserve the Java and Go API idioms as much as possible. That way, when I and others develop layers on the Rust, we can easily get inspiration from layers written in other languages.

andrew.noyes · August 9, 2021, 4:55pm

It’s probably not a very common use case, but in theory you could have several threads waiting on the same future, and if one thread cancels it the other threads should be notified of that. The c api future actually maintains cancellation state internally so you wouldn’t need to maintain it yourself.

I see you are modeling exclusive ownership of the future though so actually my example use case isn’t a concern.

rajivr · August 10, 2021, 8:32am

@andrew.noyes Thanks a lot for all the feedback so far.

I was wondering if you could please take a look at the retry logic for <FdbDatabase as TransactionContext>::run and <FdbDatabase as ReadTransactionContext>::read methods and let me know if I am implementing the retry logic correctly?

Unlike Go panic or Java exceptions, in Rust Result type is used to signal errors. Therefore I am forcing the closure to return a FdbResult<T>.

As I was updating the crate documentation, I noticed in the Java API it says - "Note: Client must call commit() and wait on the result on all transactions, even ones that only read…. Would this be relevant to Rust as we don’t have garbage collector?

andrew.noyes · August 10, 2021, 3:30pm

Yup, this looks like the standard retry logic.

I’m not sure I fully understand the java bindings recommendation, so maybe I’m not the best person to answer. I think the concern is that read futures don’t e.g. have a reference to the transaction that keeps it alive, and if you destroy the transaction while there are outstanding read futures those can fail with “transaction_cancelled”. Calling “commit” and waiting on the result will implicitly wait for all the read futures to complete, so it’s sufficient to avoid this problem. I don’t think there are any concerns with memory safety here. Basically if you think it’s possible that any of the read futures outlive the call to the callback passed to the retry loop, it might make sense to call commit and wait on the result.

rajivr · August 11, 2021, 2:07pm

Thanks @andrew.noyes.

Regarding C APIs fdb_future_get_value and fdb_transaction_get, could you please confirm if I am understanding the behavior of *out_present correctly?

When the key is absent, then *out_present is zero.
When the key is present, but with an empty value, then *out_present is non-zero, and *out_value_length is zero.
When the key is present, but with an non-empty value, then *out_present is non_zero, and *out_value_length is non-zero, along with a valid *out_value (owned by the future)

andrew.noyes · August 11, 2021, 3:19pm

Yup, that’s all correct

rajivr · August 16, 2021, 1:09pm

Thanks @andrew.noyes. There is still some way to go, but here is the initial working hello world implementation.

I can now also share with you an example is an of copying memory owned by FDB Future in an implementation of FdbFutureGet::get trait.

Thanks again for patiently answering my questions. Please do let me know if you have any thoughts/comments.

I was also wondering if it was possible to use error codes between 100 thru’ 999 for the language binding layer?

andrew.noyes · August 16, 2021, 6:00pm

I don’t think there’s any guarantee these won’t be used by fdb in the future - maybe @ajbeamon knows

ajbeamon · August 16, 2021, 9:05pm

I don’t think there’s any guarantee these won’t be used by fdb in the future - maybe @ajbeamon knows

I think you’re right, we haven’t reserved any error codes for external usage as far I know.

rajivr · August 17, 2021, 3:35am

Thanks @ajbeamon and @andrew.noyes.

There are currently two instances where I’m piggybacking on FDB error codes. One is here and the other is under development.

Once my bindings work is complete, maybe I could request for a range from upstream.

rajivr · September 2, 2021, 1:29pm

@andrew.noyes I just completed Rust Iterator implementation for range reads.

This was perhaps one of most tricky API to implement thus far, and I am kind of glad it is done!

I’ve currently modeled the tests based on RangeQueryTest.java.

I could not find additional unit tests for range reads in other bindings. I was wondering if there are additional tests somewhere else that I can potentially steal ideas from? I also checked Go bindings for tests.

Topic		Replies	Views
FoundationDB fdb_future_block_until_ready return before data is ready Using FoundationDB	4	539	November 11, 2020
Indefinite blocking of fdb_future_block_until_ready even though DB is available Using FoundationDB bindings	2	953	April 26, 2019
Checking if an error is retryable? Using FoundationDB bindings	1	1840	November 29, 2018
Fdb_future_block_until_ready timeout Using FoundationDB	0	326	May 13, 2021
Question regarding retry logic and transaction get_versionstamp API FoundationDB Core bindings	5	631	November 7, 2021

`fdb_future_block_until_ready` and retry logic

Related topics