I experience a strange crash after “1031: Operation aborted because the transaction timed out” errors (reading from multiple futures in parallel):
ERROR: AddressSanitizer: SEGV on unknown address 0x7f9e30800016
The signal is caused by a READ memory access.
0x7f9ec55fe402 in fdb_future_destroy (/lib64/libfdb_c.so+0xd5402)
The documentation of fdb_transaction_on_error() is unclear whether to also destroy the original future.
Pseudo-code:
You always need to destroy your futures, and at first glance that aspect of your code looks correct. I think the issue you may be running into comes from the following line:
Your intent was probably to pass future_error into fdb_future_get_error, and passing error_retry is going to cause an invalid pointer dereference.
Does that mean that making the fix I suggested didn’t solve all of your crashes? The example you originally pasted crashed for me as written but works if I make that change.
Do you have some code for a reproduction that I could play with?
Yes, the above pseudo-code (“error_retry” -> “future_error”) is wrong.
But this is not the problem.
So far I could not create a reproducible testcase.
It happens when I read multiple records within a transaction and (one?) read times out because the server is overloaded (a cheap Intel NUC in my case).
I will compile FDB with address-sanitizer - maybe it will report some useful information.
I forgot to mention, that the client crashes (not the server). The segfault is a rare case.
The future passed to fdb_future_destroy() is correct.
FDB 6.1.2 - compiled USE_ASAN=on
=================================================================
==26316==ERROR: AddressSanitizer: SEGV on unknown address 0x7f1c5280001a (pc 0x7f1c42679efb bp 0x7fff98c214e0 sp 0x7fff98c1db80 T0)
==26316==The signal is caused by a READ memory access.
#0 0x7f1c42679efa in fdb_future_destroy /home/login/foundationdb-6.1.2/bindings/c/fdb_c.cpp:154
#1 0x16e3aea in load_records foundationdb.cpp:982
[...snip...]
AddressSanitizer can not provide additional info.
SUMMARY: AddressSanitizer: SEGV /home/login/foundationdb-6.1.2/bindings/c/fdb_c.cpp:154 in fdb_future_destroy
==26316==ABORTING
As far as I can tell this happens only:
a) Reading multiple records in parallel
b) A read returns the error: 1031: Operation aborted because the transaction timed out
ASAN is not very helpful. Is there a compile option to get a better backtrace?