Destroying futures after 'fdb_transaction_on_error()'

I experience a strange crash after “1031: Operation aborted because the transaction timed out” errors (reading from multiple futures in parallel):

ERROR: AddressSanitizer: SEGV on unknown address 0x7f9e30800016
The signal is caused by a READ memory access.
0x7f9ec55fe402 in fdb_future_destroy (/lib64/libfdb_c.so+0xd5402)

The documentation of fdb_transaction_on_error() is unclear whether to also destroy the original future.
Pseudo-code:

FDBFuture *future_read = fdb_transaction_get( transaction, "test", 4, true );
fdb_error_t error_read = fdb_future_block_until_ready( future_read );
if( error_read == 0 )
    error_read = fdb_future_get_error( future_read );
if( error_read != 0 )
{
    FDBFuture *future_error = fdb_transaction_on_error( transaction, error_read );
    fdb_error_t error_retry = fdb_future_block_until_ready( future_error );
    if( error_retry == 0 )
        error_retry = fdb_future_get_error( error_retry );
    fdb_future_destroy( future_error );
    // Here be dragons: Is the next 'fdb_future_destroy' required?
    fdb_future_destroy( future_read );
}

You always need to destroy your futures, and at first glance that aspect of your code looks correct. I think the issue you may be running into comes from the following line:

Your intent was probably to pass future_error into fdb_future_get_error, and passing error_retry is going to cause an invalid pointer dereference.

This happens only if this two conditions met:

a)The FDB server times out 5 sec (overloaded due swapping in my case)

b) Multiple reads in parallel, pseudo-code:

FDBFuture futures[10];
for( int i=0; i < 10; ++i )
    futures[i] = fdb_transaction_get( transaction, i, 4, true );
for( int i=0; i < 10; ++i )
    fdb_future_block_until_ready( futures[i] );

I have core-dumps - but they are not very helpful. I will try to compile FDB using ASAN.

Does that mean that making the fix I suggested didn’t solve all of your crashes? The example you originally pasted crashed for me as written but works if I make that change.

Do you have some code for a reproduction that I could play with?

Yes, the above pseudo-code (“error_retry” -> “future_error”) is wrong.
But this is not the problem.

So far I could not create a reproducible testcase.
It happens when I read multiple records within a transaction and (one?) read times out because the server is overloaded (a cheap Intel NUC in my case).
I will compile FDB with address-sanitizer - maybe it will report some useful information.

I forgot to mention, that the client crashes (not the server). The segfault is a rare case.
The future passed to fdb_future_destroy() is correct.

FDB 6.1.2 - compiled USE_ASAN=on

=================================================================
==26316==ERROR: AddressSanitizer: SEGV on unknown address 0x7f1c5280001a (pc 0x7f1c42679efb bp 0x7fff98c214e0 sp 0x7fff98c1db80 T0)
==26316==The signal is caused by a READ memory access.
    #0 0x7f1c42679efa in fdb_future_destroy /home/login/foundationdb-6.1.2/bindings/c/fdb_c.cpp:154
    #1 0x16e3aea in load_records foundationdb.cpp:982
    [...snip...]

AddressSanitizer can not provide additional info.
SUMMARY: AddressSanitizer: SEGV /home/login/foundationdb-6.1.2/bindings/c/fdb_c.cpp:154 in fdb_future_destroy
==26316==ABORTING

As far as I can tell this happens only:
a) Reading multiple records in parallel
b) A read returns the error: 1031: Operation aborted because the transaction timed out

ASAN is not very helpful. Is there a compile option to get a better backtrace?

Wrong. It did not - and does not - use address sanitizer.

This is what I did:

tar xvfz 6.1.2.tar.gz
cd foundationdb-6.1.2/build
cmake USE_ASAN=ON CMAKE_VERBOSE_MAKEFILE=ON -DBOOST_ROOT=/tmp/backup/boost_1_67_0 ..
make

But it compiles without “-fno-omit-frame-pointer -fsanitize=address”.

How do you enable address sanitizer?

The proper syntax for cmake variables is -DVARIABLE=VALUE. So in your command should look like this:

cmake -DUSE_ASAN=ON CMAKE_VERBOSE_MAKEFILE=ON -DBOOST_ROOT=/tmp/backup/boost_1_67_0 ..

@ajbeamon
@markus.pilman

This issue was my fault - I called fdb_future_destroy() accidentally twice - which lead to use-after-free.
Sorry for the inconvenience.