What is the life time of key, value and other arguments passed to fdb_transaction_*

I am working on a binding for elixir language https://github.com/ananthakumaran/fdb. The C API documentation is not quite clear about the life time of arguments passed to fdb_transaction_* functions. The *_set_option clearly states that the arguments can be freed after the function returns. But most of the other functions don’t specify for how long the pointers should be kept valid.

It also seems like FDBTransaction implicitly depends on FDBDatabase and FDBCluster. The binding was crashing under valgrind until I made sure the database and cluster are not destroyed before transaction. Same case with FDBFuture, if the FDBTransaction is destroyed before the future is resolved, (for example garbage collector kicks in and figures out the transaction is not referenced anymore) future returns error when resolved.

It would be nice if this is documented clearly because these don’t seem to occur during normal testing.

PS. I am not an experienced c/c++ programmer. I mostly work with languages that support garbage collection, so it’s possible that I might be missing the obvious here

Arguments passed in to the functions should be ok to free once the call returns to you.

I don’t think that futures depend on transactions, but if the operation that is being performed is being done on a transaction that gets destroyed, the operation will be cancelled. This will result in an error (transaction_cancelled, I think) bubbling out through the future.

I would have expected that destroying the cluster or database while a transaction was open wouldn’t cause a crash, so long as you don’t destroy the object more than once (as documented). Transactions and Databases hold references to their parents which should keep them alive as long as the child object needs them. I tried a simple test to verify that this is true, so if there is a bug here it may be a little more subtle and some sample code might help to track it down.

I guess the issue might have something to do with clear_range. The database is flushed before running every test. The code below contains a sample

  def flushdb do
    t = new_transaction()
    :ok = Transaction.clear_range(t, "", <<0xFF>>)
    Transaction.commit(t)
  end

  setup do
    flushdb()
  end

  test "resource early garbage collection" do
    parent = self()

    # A temp process is used to trigger garbage collection of cluster
    # & database.
    spawn_link(fn ->
      send(parent, new_transaction())
    end)

    receive do
      t ->
        :ok = Transaction.set_option(t, FDB.Option.transaction_option_access_system_keys())
        assert Transaction.get(t, "\xff\xff/status/json")
        assert Transaction.get(t, "\xff\xff/cluster_file_path")
    end
  end
==2642== Memcheck, a memory error detector
==2642== Copyright (C) 2002-2013, and GNU GPL'd, by Julian Seward et al.
==2642== Using Valgrind-3.10.1 and LibVEX; rerun with -h for copyright info
==2642== Command: /otp_src_20.3/bin/x86_64-unknown-linux-gnu/beam.valgrind.smp -S4:4 -- -root /otp_src_20.3 -progname /otp_src_20.3/bin/cerl\ -valgrind -- -home /root -- -pa /elixir/bin/../lib/eex/ebin /elixir/bin/../lib/elixir/ebin /elixir/bin/../lib/ex_unit/ebin /elixir/bin/../lib/iex/ebin /elixir/bin/../lib/logger/ebin /elixir/bin/../lib/mix/ebin -noshell -s elixir start_cli -extra /elixir/bin/mix test test/fdb_stress_test.exs:52
==2642== 
==2642== Warning: set address range perms: large range [0x3a056000, 0x7a056000) (noaccess)
make: Nothing to be done for 'all'.
Including tags: [line: "52"]
Excluding tags: [:test]

==2642== Thread 33 fdb:
==2642== Invalid read of size 4
==2642==    at 0xC04AE97: addref (FastRef.h:71)
==2642==    by 0xC04AE97: operator() (ThreadSafeTransaction.actor.cpp:120)
==2642==    by 0xC04AE97: a_body1cont1 (ThreadHelper.actor.h:574)
==2642==    by 0xC04AE97: a_body1when1 (ThreadHelper.actor.g.h:896)
==2642==    by 0xC04AE97: a_callback_fire (ThreadHelper.actor.g.h:910)
==2642==    by 0xC04AE97: ActorCallback<(anonymous namespace)::DoOnMainThreadVoidActor<ThreadSafeTransaction::ThreadSafeTransaction(ThreadSafeDatabase*)::{lambda()#1}>, 0, Void>::fire((anonymous namespace)::DoOnMainThreadVoidActor<ThreadSafeTransaction::ThreadSafeTransaction(ThreadSafeDatabase*)::{lambda()#1}> const&) (flow.h:928)
==2642==    by 0xBFD7007: void SAV<Void>::send<Void>(Void&&) (flow.h:382)
==2642==    by 0xC1E6662: send<Void> (flow.h:708)
==2642==    by 0xC1E6662: operator() (Net2.actor.cpp:473)
==2642==    by 0xC1E6662: N2::Net2::run() (Net2.actor.cpp:628)
==2642==    by 0xBF43627: runNetwork() (NativeAPI.actor.cpp:863)
==2642==    by 0xBF1731B: MultiVersionApi::runNetwork() (MultiVersionTransaction.actor.cpp:1197)
==2642==    by 0xBEF5B98: fdb_run_network (fdb_c.cpp:119)
==2642==    by 0xB941CAA: run_network_wrapper (fdb_nif.c:115)
==2642==    by 0x5E16E1: erl_drv_thread_wrapper (erl_drv_thread.c:122)
==2642==    by 0x6BFA99: thr_wrapper (ethread.c:118)
==2642==    by 0x578F6A9: start_thread (pthread_create.c:333)
==2642==  Address 0x95b5dc0 is 0 bytes inside a block of size 968 free'd
==2642==    at 0x4C2CE10: free (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==2642==    by 0xC04A377: delref (FastRef.h:74)
==2642==    by 0xC04A377: operator() (ThreadSafeTransaction.actor.cpp:106)
==2642==    by 0xC04A377: a_body1cont1 (ThreadHelper.actor.h:574)
==2642==    by 0xC04A377: a_body1when1 (ThreadHelper.actor.g.h:896)
==2642==    by 0xC04A377: a_callback_fire (ThreadHelper.actor.g.h:910)
==2642==    by 0xC04A377: ActorCallback<(anonymous namespace)::DoOnMainThreadVoidActor<ThreadSafeDatabase::~ThreadSafeDatabase()::{lambda()#1}>, 0, Void>::fire((anonymous namespace)::DoOnMainThreadVoidActor<ThreadSafeDatabase::~ThreadSafeDatabase()::{lambda()#1}> const&) (flow.h:928)
==2642==    by 0xBFD7007: void SAV<Void>::send<Void>(Void&&) (flow.h:382)
==2642==    by 0xC1E6662: send<Void> (flow.h:708)
==2642==    by 0xC1E6662: operator() (Net2.actor.cpp:473)
==2642==    by 0xC1E6662: N2::Net2::run() (Net2.actor.cpp:628)
==2642==    by 0xBF43627: runNetwork() (NativeAPI.actor.cpp:863)
==2642==    by 0xBF1731B: MultiVersionApi::runNetwork() (MultiVersionTransaction.actor.cpp:1197)
==2642==    by 0xBEF5B98: fdb_run_network (fdb_c.cpp:119)
==2642==    by 0xB941CAA: run_network_wrapper (fdb_nif.c:115)
==2642==    by 0x5E16E1: erl_drv_thread_wrapper (erl_drv_thread.c:122)
==2642==    by 0x6BFA99: thr_wrapper (ethread.c:118)
==2642==    by 0x578F6A9: start_thread (pthread_create.c:333)
==2642== 

full log can be found here https://gist.github.com/ananthakumaran/9b8a9f525c522ba5b3f08543837294db

There error goes away when

  1. the flushdb call before the test is removed, then all the tests run without errors from valgrind
  2. if I try to trace the exact call sequence. I guess fprintf causes enough delay to mask the issue
==2779== Memcheck, a memory error detector
==2779== Copyright (C) 2002-2013, and GNU GPL'd, by Julian Seward et al.
==2779== Using Valgrind-3.10.1 and LibVEX; rerun with -h for copyright info
==2779== Command: /otp_src_20.3/bin/x86_64-unknown-linux-gnu/beam.valgrind.smp -S4:4 -- -root /otp_src_20.3 -progname /otp_src_20.3/bin/cerl\ -valgrind -- -home /root -- -pa /elixir/bin/../lib/eex/ebin /elixir/bin/../lib/elixir/ebin /elixir/bin/../lib/ex_unit/ebin /elixir/bin/../lib/iex/ebin /elixir/bin/../lib/logger/ebin /elixir/bin/../lib/mix/ebin -noshell -s elixir start_cli -extra /elixir/bin/mix test test/fdb_stress_test.exs:52
==2779== 
==2779== Warning: set address range perms: large range [0x3a056000, 0x7a056000) (noaccess)
make: Nothing to be done for 'all'.
Including tags: [line: "52"]
Excluding tags: [:test]

fdb_cluster_create
fdb_future_get_*
fdb_cluster_create_database
fdb_future_get_*
fdb_future_destroy
fdb_future_destroy
fdb_cluster_destroy
fdb_database_create_transaction
fdb_transaction_commit
fdb_future_get_*
fdb_cluster_create
fdb_future_get_*
fdb_cluster_create_database
fdb_future_get_*
fdb_database_create_transaction
fdb_database_destroy
fdb_future_destroy
fdb_cluster_destroy
fdb_future_destroy
fdb_transaction_get
fdb_future_get_*
fdb_transaction_get
fdb_future_get_*
fdb_future_destroy
fdb_future_destroy
fdb_transaction_destroy
fdb_database_destroy
.fdb_future_destroy
fdb_transaction_destroy


Finished in 3.5 seconds
3 tests, 0 failures, 2 skipped

Randomized with seed 895969
==2779== 
==2779== HEAP SUMMARY:
==2779==     in use at exit: 33,210,142 bytes in 28,684 blocks
==2779==   total heap usage: 203,607 allocs, 174,923 frees, 268,317,204 bytes allocated
==2779== 
==2779== 480 bytes in 10 blocks are definitely lost in loss record 628 of 820
==2779==    at 0x4C2BBA0: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==2779==    by 0x626648: erts_sys_alloc (sys.c:1206)
==2779==    by 0x59796F: erts_alloc (erl_alloc.h:232)
==2779==    by 0x598524: erts_thr_progress_register_unmanaged_thread (erl_thr_progress.c:531)
==2779==    by 0x5AE7B8: async_thread_init (erl_async.c:497)
==2779==    by 0x5AE89C: async_main (erl_async.c:524)
==2779==    by 0x6BFA99: thr_wrapper (ethread.c:118)
==2779==    by 0x578F6A9: start_thread (pthread_create.c:333)
==2779== 
==2779== LEAK SUMMARY:
==2779==    definitely lost: 480 bytes in 10 blocks
==2779==    indirectly lost: 0 bytes in 0 blocks
==2779==      possibly lost: 919,274 bytes in 4,517 blocks
==2779==    still reachable: 32,290,388 bytes in 24,157 blocks
==2779==         suppressed: 0 bytes in 0 blocks
==2779== Reachable blocks (those to which a pointer was found) are not shown.
==2779== To see them, rerun with: --leak-check=full --show-leak-kinds=all
==2779== 
==2779== For counts of detected and suppressed errors, rerun with: -v
==2779== ERROR SUMMARY: 220 errors from 220 contexts (suppressed: 0 from 0)
  1. if I keep a reference to database & cluster in transaction, thereby preventing the VM from calling database_destroy and or cluster_destroy before transaction_destroy, then the error goes away.

I am using library version 5.1.7. Let me know if this gives enough clues. Otherwise, I will try to track down further when I get free time.

It’s a bit hard to say what’s happening without seeing the calls to destroy here, but I will note that the line reported for the invalid read (ThreadSafeTransaction.actor.cpp:120) is in the constructor for ThreadSafeTransaction. That makes me think that the database is being destroyed before/while it is being used to create a transaction, which could certainly lead to this behavior.

I’d check to see if it’s possible that your client is not waiting until the transaction is created to destroy the database. If it is correctly waiting, we may have to see if there’s some way the calls could be reordered internally.

I will try to check more (inability to add printf makes it difficult for me to get the exact call sequence). I think it’s unlikey for database to be destroyed while transaction is being created. fdb_database_create_transaction itself is synchrounous. Here is the call and corresponding c implementation. VM can’t destroy database.resource till the call returns

    resource =
      Native.database_create_transaction(database.resource)
      |> Utils.verify_result()
database_create_transaction(ErlNifEnv *env, int argc,
                            const ERL_NIF_TERM argv[]) {
  Database *database;
  FDBTransaction *fdb_transaction;
  Reference *reference;
  fdb_error_t error;
  ERL_NIF_TERM result;
  VERIFY_ARGV(enif_get_resource(env, argv[0], DATABASE_RESOURCE_TYPE,
                                (void **)&database),
              "database");
  error = fdb_database_create_transaction(database->handle, &fdb_transaction);
  if (error) {
    return enif_make_tuple2(env, enif_make_int(env, error),
                            make_atom(env, "nil"));
  }
  reference = reference_resource_create(database, NULL);
  result = fdb_transaction_to_transaction(env, fdb_transaction, reference);
  return enif_make_tuple2(env, enif_make_int(env, error), result);
}

fdb_database_create_transaction synchronously returns the transaction, but it does schedule work to be run on the network thread, and it’s on that thread where the bad access is happening. Likewise, calling fdb_database_destroy causes the destruction to scheduled on the network thread. These tasks should be being run in the order that they are scheduled, assuming they aren’t being scheduled simultaneously.

One possible exception to the above (based on a cursory reading, I could be wrong) is that it appears an action scheduled from the network thread can move ahead of an action already scheduled from another thread. If you were calling fdb_database_destroy on the network thread (i.e. on the thread that callbacks are usually called on) but fdb_create_transaction on a different thread, perhaps that could result in this behavior.

I think I found a small case. Just creating multiple transactions causes the problem in elixir also. You might have to run it multiple times. Let me know if there are any issues in the code itself.

#define FDB_API_VERSION 510

#include "assert.h"
#include "foundationdb/fdb_c.h"
#include "pthread.h"
#include "stdio.h"
#include "stdlib.h"
#include "string.h"

void *run_network() {
  fdb_run_network();
  return NULL;
}

void new_transaction() {
  FDBTransaction *transaction;
  FDBDatabase *database;
  FDBCluster *cluster;
  FDBFuture *future;

  future = fdb_create_cluster(NULL);
  fdb_error_t error;
  error = fdb_future_block_until_ready(future);
  assert(error == 0);
  error = fdb_future_get_cluster(future, &cluster);
  assert(error == 0);
  fdb_future_destroy(future);

  const char *db = "DB";
  future = fdb_cluster_create_database(cluster, (const uint8_t *)db, 2);
  error = fdb_future_block_until_ready(future);
  assert(error == 0);
  error = fdb_future_get_database(future, &database);
  assert(error == 0);
  fdb_future_destroy(future);

  error = fdb_database_create_transaction(database, &transaction);
  assert(error == 0);

  fdb_transaction_destroy(transaction);
  fdb_database_destroy(database);
  fdb_cluster_destroy(cluster);
}

void test() {
  for (int i = 0; i < 1000; i++) {
    new_transaction();
  }
}

int main() {
  fdb_error_t error;
  int result;
  error = fdb_select_api_version(510);
  assert(error == 0);
  error = fdb_setup_network();
  assert(error == 0);
  pthread_t network_thread;
  result = pthread_create(&network_thread, NULL, *run_network, NULL);
  assert(result == 0);
  test();
  error = fdb_stop_network();
  assert(error == 0);
  result = pthread_join(network_thread, NULL);
  assert(result == 0);
  return 0;
}

full log https://gist.github.com/1c845de8cfd56711791c72a6a65f1a22

make clean && make db && make run
rm -rf db
/usr/bin/gcc -g -O3 -ansi -pedantic -Wall -Wextra -Wno-unused-parameter -std=gnu99 -fPIC  -L/usr/local/lib/ -L/usr/lib/ -Wl,--no-as-needed -lm -lpthread -lrt -lstdc++ -lfdb_cdb.c   -o db
valgrind ./db
==926== Memcheck, a memory error detector
==926== Copyright (C) 2002-2013, and GNU GPL'd, by Julian Seward et al.
==926== Using Valgrind-3.10.1 and LibVEX; rerun with -h for copyright info
==926== Command: ./db
==926== 
==926== Thread 2:
==926== Invalid read of size 4
==926==    at 0x5A68E97: addref (FastRef.h:71)
==926==    by 0x5A68E97: operator() (ThreadSafeTransaction.actor.cpp:120)
==926==    by 0x5A68E97: a_body1cont1 (ThreadHelper.actor.h:574)
==926==    by 0x5A68E97: a_body1when1 (ThreadHelper.actor.g.h:896)
==926==    by 0x5A68E97: a_callback_fire (ThreadHelper.actor.g.h:910)
==926==    by 0x5A68E97: ActorCallback<(anonymous namespace)::DoOnMainThreadVoidActor<ThreadSafeTransaction::ThreadSafeTransaction(ThreadSafeDatabase*)::{lambda()#1}>, 0, Void>::fire((anonymous namespace)::DoOnMainThreadVoidActor<ThreadSafeTransaction::ThreadSafeTransaction(ThreadSafeDatabase*)::{lambda()#1}> const&) (flow.h:928)
==926==    by 0x59F5007: void SAV<Void>::send<Void>(Void&&) (flow.h:382)
==926==    by 0x5C04662: send<Void> (flow.h:708)
==926==    by 0x5C04662: operator() (Net2.actor.cpp:473)
==926==    by 0x5C04662: N2::Net2::run() (Net2.actor.cpp:628)
==926==    by 0x5961627: runNetwork() (NativeAPI.actor.cpp:863)
==926==    by 0x593531B: MultiVersionApi::runNetwork() (MultiVersionTransaction.actor.cpp:1197)
==926==    by 0x5913B98: fdb_run_network (fdb_c.cpp:119)
==926==    by 0x400D6A: run_network (db.c:11)
==926==    by 0x51476A9: start_thread (pthread_create.c:333)
==926==    by 0x6168EEC: clone (clone.S:109)
==926==  Address 0x68c2ce0 is 0 bytes inside a block of size 968 free'd
==926==    at 0x4C2CE10: free (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==926==    by 0x5A68377: delref (FastRef.h:74)
==926==    by 0x5A68377: operator() (ThreadSafeTransaction.actor.cpp:106)
==926==    by 0x5A68377: a_body1cont1 (ThreadHelper.actor.h:574)
==926==    by 0x5A68377: a_body1when1 (ThreadHelper.actor.g.h:896)
==926==    by 0x5A68377: a_callback_fire (ThreadHelper.actor.g.h:910)
==926==    by 0x5A68377: ActorCallback<(anonymous namespace)::DoOnMainThreadVoidActor<ThreadSafeDatabase::~ThreadSafeDatabase()::{lambda()#1}>, 0, Void>::fire((anonymous namespace)::DoOnMainThreadVoidActor<ThreadSafeDatabase::~ThreadSafeDatabase()::{lambda()#1}> const&) (flow.h:928)
==926==    by 0x59F5007: void SAV<Void>::send<Void>(Void&&) (flow.h:382)
==926==    by 0x5C04662: send<Void> (flow.h:708)
==926==    by 0x5C04662: operator() (Net2.actor.cpp:473)
==926==    by 0x5C04662: N2::Net2::run() (Net2.actor.cpp:628)
==926==    by 0x5961627: runNetwork() (NativeAPI.actor.cpp:863)
==926==    by 0x593531B: MultiVersionApi::runNetwork() (MultiVersionTransaction.actor.cpp:1197)
==926==    by 0x5913B98: fdb_run_network (fdb_c.cpp:119)
==926==    by 0x400D6A: run_network (db.c:11)
==926==    by 0x51476A9: start_thread (pthread_create.c:333)
==926==    by 0x6168EEC: clone (clone.S:109)
==926== 
==926== Invalid read of size 4
==926==    at 0x597468C: Transaction::Transaction(Database const&) (NativeAPI.actor.cpp:1770)
==926==    by 0x5AAD822: ReadYourWritesTransaction::ReadYourWritesTransaction(Database const&) (ReadYourWrites.actor.cpp:1103)
==926==    by 0x5A68EA2: operator() (ThreadSafeTransaction.actor.cpp:120)
==926==    by 0x5A68EA2: a_body1cont1 (ThreadHelper.actor.h:574)
==926==    by 0x5A68EA2: a_body1when1 (ThreadHelper.actor.g.h:896)
==926==    by 0x5A68EA2: a_callback_fire (ThreadHelper.actor.g.h:910)
==926==    by 0x5A68EA2: ActorCallback<(anonymous namespace)::DoOnMainThreadVoidActor<ThreadSafeTransaction::ThreadSafeTransaction(ThreadSafeDatabase*)::{lambda()#1}>, 0, Void>::fire((anonymous namespace)::DoOnMainThreadVoidActor<ThreadSafeTransaction::ThreadSafeTransaction(ThreadSafeDatabase*)::{lambda()#1}> const&) (flow.h:928)
==926==    by 0x59F5007: void SAV<Void>::send<Void>(Void&&) (flow.h:382)
==926==    by 0x5C04662: send<Void> (flow.h:708)
==926==    by 0x5C04662: operator() (Net2.actor.cpp:473)
==926==    by 0x5C04662: N2::Net2::run() (Net2.actor.cpp:628)
==926==    by 0x5961627: runNetwork() (NativeAPI.actor.cpp:863)
==926==    by 0x593531B: MultiVersionApi::runNetwork() (MultiVersionTransaction.actor.cpp:1197)
==926==    by 0x5913B98: fdb_run_network (fdb_c.cpp:119)
==926==    by 0x400D6A: run_network (db.c:11)
==926==    by 0x51476A9: start_thread (pthread_create.c:333)
==926==    by 0x6168EEC: clone (clone.S:109)
==926==  Address 0x68c3090 is 944 bytes inside a block of size 968 free'd
==926==    at 0x4C2CE10: free (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==926==    by 0x5A68377: delref (FastRef.h:74)
==926==    by 0x5A68377: operator() (ThreadSafeTransaction.actor.cpp:106)
==926==    by 0x5A68377: a_body1cont1 (ThreadHelper.actor.h:574)
==926==    by 0x5A68377: a_body1when1 (ThreadHelper.actor.g.h:896)
==926==    by 0x5A68377: a_callback_fire (ThreadHelper.actor.g.h:910)
==926==    by 0x5A68377: ActorCallback<(anonymous namespace)::DoOnMainThreadVoidActor<ThreadSafeDatabase::~ThreadSafeDatabase()::{lambda()#1}>, 0, Void>::fire((anonymous namespace)::DoOnMainThreadVoidActor<ThreadSafeDatabase::~ThreadSafeDatabase()::{lambda()#1}> const&) (flow.h:928)
==926==    by 0x59F5007: void SAV<Void>::send<Void>(Void&&) (flow.h:382)
==926==    by 0x5C04662: send<Void> (flow.h:708)
==926==    by 0x5C04662: operator() (Net2.actor.cpp:473)
==926==    by 0x5C04662: N2::Net2::run() (Net2.actor.cpp:628)
==926==    by 0x5961627: runNetwork() (NativeAPI.actor.cpp:863)
==926==    by 0x593531B: MultiVersionApi::runNetwork() (MultiVersionTransaction.actor.cpp:1197)
==926==    by 0x5913B98: fdb_run_network (fdb_c.cpp:119)
==926==    by 0x400D6A: run_network (db.c:11)
==926==    by 0x51476A9: start_thread (pthread_create.c:333)
==926==    by 0x6168EEC: clone (clone.S:109)
==926== 
==926== Invalid read of size 8
==926==    at 0x596250A: Transaction::createTrLogInfoProbabilistically(Database const&) (NativeAPI.actor.cpp:2951)
==926==    by 0x597471B: Transaction::Transaction(Database const&) (NativeAPI.actor.cpp:1770)
==926==    by 0x5AAD822: ReadYourWritesTransaction::ReadYourWritesTransaction(Database const&) (ReadYourWrites.actor.cpp:1103)
==926==    by 0x5A68EA2: operator() (ThreadSafeTransaction.actor.cpp:120)
==926==    by 0x5A68EA2: a_body1cont1 (ThreadHelper.actor.h:574)
==926==    by 0x5A68EA2: a_body1when1 (ThreadHelper.actor.g.h:896)
==926==    by 0x5A68EA2: a_callback_fire (ThreadHelper.actor.g.h:910)
==926==    by 0x5A68EA2: ActorCallback<(anonymous namespace)::DoOnMainThreadVoidActor<ThreadSafeTransaction::ThreadSafeTransaction(ThreadSafeDatabase*)::{lambda()#1}>, 0, Void>::fire((anonymous namespace)::DoOnMainThreadVoidActor<ThreadSafeTransaction::ThreadSafeTransaction(ThreadSafeDatabase*)::{lambda()#1}> const&) (flow.h:928)
==926==    by 0x59F5007: void SAV<Void>::send<Void>(Void&&) (flow.h:382)
==926==    by 0x5C04662: send<Void> (flow.h:708)
==926==    by 0x5C04662: operator() (Net2.actor.cpp:473)

Thanks, I’ll try to give it a run later today.

I may have an idea what’s happening here. fdb_cluster_create_database does the actual database creation on the network thread, and then signals completion via a future. It’s holding onto a couple references to the database object (which in turn holds a reference to another object, say dbContext, which is shared with the transaction). When the job is done and it signals the future, the C code is immediately getting that database object, using it to create a transaction, and then destroying it.

If the above completes in the following order, then I think we run into problems:

  1. Future is signaled
  2. Client code runs at least through fdb_database_destroy
  3. The task that signaled the future finishes wrapping up and drops its references to the database object.

The expectation was that 3 completes before 2, in which case the call to fdb_database_destroy schedules the last decrement of the database object, causing the dbContext reference count to be decremented after the transaction increments it.

In the case where 2 happens before 3, then the last reference to the database object is deleted on the network thread. It schedules the decrement of the dbContext reference count, but as I mentioned in a previous message, the fact that this happens on the network thread means that it can jump in line in front of the transaction work that increments the reference count. As a result, dbContext is deleted prematurely.

This is a good find, thanks for the reproduction. I’ll file an issue in GitHub to address it.

EDIT: https://github.com/apple/foundationdb/issues/518

As a side note, the reason we probably haven’t seen this previously is that we don’t typically delete databases so quickly after creating them. The recommended practice is to keep the same database object around for as long as you’ll want to be creating transactions and then close it only when you’re done interacting with the database.

It took me a while to understand your explanation (the key thing I missed is that the race you are worried about is to drop the last reference count on the ThreadSafeDatabase object, not on the DatabaseContext. If the winner of the former race is the main thread, then the latter may happen early).

I’m not sure, but I think that if (as it seems) Net2::onMainThread() isn’t linearizable when called from the main thread, that is the bug. There’s no documentation in the interface that it is supposed to preserve ordering, but clearly we rely on it. Client code calling the API is allowed to run on the main thread (via future callbacks), so I think there is no way to ensure that the last reference to (e.g.) a ThreadSafeDatabase isn’t dropped from there. Moreover, I think there are probably other ways (having nothing to do with reference counting) to violate API ordering requirements this way.

I don’t know if the right implementation (from a performance perspective) is to empty the thread queue into ready before adding onMainThread tasks to ready, or to defer onMainThread tasks to the next run loop cycle, or something else.

As always, the other important thing to look at is how to better test for this type of problem!

A fix for this has been merged and will be part of the upcoming 6.0 release.

https://github.com/apple/foundationdb/pull/610

1 Like