Memory leak in client profiling code

It appears that turning on client profiling (profile client set 0.5 10MB for instance) on the cluster causes the client library to leak memory. valgrind doesn’t seem to catch it, so I ended up using heaptrack. I’m afraid I don’t have time to understand the arena allocation stuff well enough to figure out what’s actually happening, and submit an issue/PR. Here’s a stack trace down to the culprit allocation:

__clone in ?? (libc.so.6)
start_thread in ?? (libpthread.so.0)
fdb_run_network in fdb_c.cpp:124 (libfdb_c.so)
MultiVersionApi::runNetwork() in MultiVersionTransaction.actor.cpp:1182 (libfdb_c.so)
ThreadSafeApi::runNetwork() in ThreadSafeTransaction.actor.cpp:367 (libfdb_c.so)
runNetwork() in NativeAPI.actor.cpp:1019 (libfdb_c.so)
N2::Net2::run() in Net2.actor.cpp:1115 (libfdb_c.so)
N2::PromiseTask::operator()() in Net2.actor.cpp:843 (libfdb_c.so)
void Promise<Void>::send<Void>(Void&&) const in flow.h:779 (libfdb_c.so)
void SAV<Void>::send<Void>(Void&&) in flow.h:447 (libfdb_c.so)
ActorCallback<(anonymous namespace)::DeliverActor, 0, Void>::fire(Void const&) in flow.h:999 (libfdb_c.so)
(anonymous namespace)::DeliverActorState<(anonymous namespace)::DeliverActor>::a_callback_fire(ActorCallback<(anonymous namespace)::DeliverActor, 0, Void>*, Void) in FlowTransport.actor.g.cpp:2452 (libfdb_c.so)
(anonymous namespace)::DeliverActorState<(anonymous namespace)::DeliverActor>::a_body1when1(Void const&, int) in FlowTransport.actor.g.cpp:2437 (libfdb_c.so)
(anonymous namespace)::DeliverActorState<(anonymous namespace)::DeliverActor>::a_body1cont2(Void const&, int) in FlowTransport.actor.g.cpp:2431 (libfdb_c.so)
(anonymous namespace)::DeliverActorState<(anonymous namespace)::DeliverActor>::a_body1cont1(int) in FlowTransport.actor.cpp:654 (libfdb_c.so)
NetSAV<CommitID>::receive(ArenaObjectReader&) in fdbrpc.h:111 (libfdb_c.so)
void SAV<CommitID>::sendAndDelPromiseRef<CommitID&>(CommitID&) in flow.h:472 (libfdb_c.so)
SAV<CommitID>::finishSendAndDelPromiseRef() in flow.h:479 (libfdb_c.so)
ActorCallback<(anonymous namespace)::WaitValueOrSignalActor<CommitID>, 0, CommitID>::fire(CommitID const&) in flow.h:999 (libfdb_c.so)
(anonymous namespace)::WaitValueOrSignalActorState<CommitID, (anonymous namespace)::WaitValueOrSignalActor<CommitID> >::a_callback_fire(ActorCallback<(anonymous namespace)::WaitValueOrSignalActor<CommitID>, 0, CommitID>*, CommitID) in genericactors.actor.g.h:2548 (libfdb_c.so)
(anonymous namespace)::WaitValueOrSignalActorState<CommitID, (anonymous namespace)::WaitValueOrSignalActor<CommitID> >::a_body1loopBody1when1(CommitID const&, int) in genericactors.actor.g.h:2519 (libfdb_c.so)
SAV<ErrorOr<CommitID> >::finishSendAndDelPromiseRef() in flow.h:479 (libfdb_c.so)
ActorCallback<(anonymous namespace)::MakeRequestActor<CommitTransactionRequest>, 1, ErrorOr<CommitID> >::fire(ErrorOr<CommitID> const&) in flow.h:999 (libfdb_c.so)
(anonymous namespace)::MakeRequestActorState<CommitTransactionRequest, (anonymous namespace)::MakeRequestActor<CommitTransactionRequest> >::a_callback_fire(ActorCallback<(anonymous namespace)::MakeRequestActor<CommitTransactionRequest>, 1, ErrorOr<CommitID> >*, ErrorOr<CommitID>) in LoadBalance.actor.g.h:333 (libfdb_c.so)
(anonymous namespace)::MakeRequestActorState<CommitTransactionRequest, (anonymous namespace)::MakeRequestActor<CommitTransactionRequest> >::a_body1cont1when1(ErrorOr<CommitID> const&, int) in LoadBalance.actor.g.h:318 (libfdb_c.so)
(anonymous namespace)::MakeRequestActorState<CommitTransactionRequest, (anonymous namespace)::MakeRequestActor<CommitTransactionRequest> >::a_body1cont3(ErrorOr<CommitID> const&, int) in LoadBalance.actor.g.h:300 (libfdb_c.so)
SAV<Optional<CommitID> >::finishSendAndDelPromiseRef() in flow.h:479 (libfdb_c.so)
ActorCallback<(anonymous namespace)::ErrorOrActor<Optional<CommitID> >, 0, Optional<CommitID> >::fire(Optional<CommitID> const&) in flow.h:999 (libfdb_c.so)
(anonymous namespace)::ErrorOrActorState<Optional<CommitID>, (anonymous namespace)::ErrorOrActor<Optional<CommitID> > >::a_callback_fire(ActorCallback<(anonymous namespace)::ErrorOrActor<Optional<CommitID> >, 0, Optional<CommitID> >*, Optional<CommitID>) in genericactors.actor.g.h:838 (libfdb_c.so)
(anonymous namespace)::ErrorOrActorState<Optional<CommitID>, (anonymous namespace)::ErrorOrActor<Optional<CommitID> > >::a_body1when1(Optional<CommitID> const&, int) in genericactors.actor.g.h:823 (libfdb_c.so)
(anonymous namespace)::ErrorOrActorState<Optional<CommitID>, (anonymous namespace)::ErrorOrActor<Optional<CommitID> > >::a_body1cont2(Optional<CommitID> const&, int) in genericactors.actor.g.h:816 (libfdb_c.so)
SAV<ErrorOr<Optional<CommitID> > >::finishSendAndDelPromiseRef() in flow.h:479 (libfdb_c.so)
ActorCallback<(anonymous namespace)::LoadBalanceActor<MasterProxyInterface, CommitTransactionRequest, MasterProxyInterface>, 6, ErrorOr<Optional<CommitID> > >::fire(ErrorOr<Optional<CommitID> > const&) in flow.h:999 (libfdb_c.so)
(anonymous namespace)::LoadBalanceActorState<MasterProxyInterface, CommitTransactionRequest, MasterProxyInterface, (anonymous namespace)::LoadBalanceActor<MasterProxyInterface, CommitTransactionRequest, MasterProxyInterface> >::a_callback_fire(ActorCallback<(anonymous namespace)::LoadBalanceActor<MasterProxyInterface, CommitTransactionRequest, MasterProxyInterface>, 6, ErrorOr<Optional<CommitID> > >*, ErrorOr<Optional<CommitID> >) in LoadBalance.actor.g.h:1518 (libfdb_c.so)
(anonymous namespace)::LoadBalanceActorState<MasterProxyInterface, CommitTransactionRequest, MasterProxyInterface, (anonymous namespace)::LoadBalanceActor<MasterProxyInterface, CommitTransactionRequest, MasterProxyInterface> >::a_body1loopBody1loopBody2when1(ErrorOr<Optional<CommitID> > const&, int) in LoadBalance.actor.g.h:1475 (libfdb_c.so)
SAV<CommitID>::finishSendAndDelPromiseRef() in flow.h:479 (libfdb_c.so)
ActorCallback<(anonymous namespace)::TryCommitActor, 2, CommitID>::fire(CommitID const&) in flow.h:999 (libfdb_c.so)
(anonymous namespace)::TryCommitActorState<(anonymous namespace)::TryCommitActor>::a_callback_fire(ActorCallback<(anonymous namespace)::TryCommitActor, 2, CommitID>*, CommitID) in NativeAPI.actor.g.cpp:12805 (libfdb_c.so)
(anonymous namespace)::TryCommitActorState<(anonymous namespace)::TryCommitActor>::a_body1cont3when2(CommitID const&, int) in NativeAPI.actor.cpp:2748 (libfdb_c.so)
void TransactionLogInfo::addLog<FdbClientLogEvents::EventCommit>(FdbClientLogEvents::EventCommit const&) in NativeAPI.actor.h:169 (libfdb_c.so)
BinaryWriter::WRITER& operator<< <BinaryWriter, FdbClientLogEvents::EventCommit>(BinaryWriter&, FdbClientLogEvents::EventCommit const&) in serialize.h:75 (libfdb_c.so)
void save<BinaryWriter, FdbClientLogEvents::EventCommit>(BinaryWriter&, FdbClientLogEvents::EventCommit const&) in serialize.h:113 (libfdb_c.so)
Serializer<BinaryWriter, FdbClientLogEvents::EventCommit, void>::serialize(BinaryWriter&, FdbClientLogEvents::EventCommit&) in serialize.h:106 (libfdb_c.so)
BinaryWriter& FdbClientLogEvents::EventCommit::serialize<BinaryWriter>(BinaryWriter&) in ClientLogEvents.h:171 (libfdb_c.so)
BinaryWriter::WRITER& serializer<BinaryWriter, double, int, int, CommitTransactionRef, Arena>(BinaryWriter&, double const&, int const&, int const&, CommitTransactionRef const&, Arena const&) in serialize.h:91 (libfdb_c.so)
BinaryWriter::WRITER& serializer<BinaryWriter, int, int, CommitTransactionRef, Arena>(BinaryWriter&, int const&, int const&, CommitTransactionRef const&, Arena const&) in serialize.h:91 (libfdb_c.so)
BinaryWriter::WRITER& serializer<BinaryWriter, int, CommitTransactionRef, Arena>(BinaryWriter&, int const&, CommitTransactionRef const&, Arena const&) in serialize.h:91 (libfdb_c.so)
BinaryWriter::WRITER& serializer<BinaryWriter, CommitTransactionRef, Arena>(BinaryWriter&, CommitTransactionRef const&, Arena const&) in serialize.h:90 (libfdb_c.so)
void save<BinaryWriter, CommitTransactionRef>(BinaryWriter&, CommitTransactionRef const&) in serialize.h:113 (libfdb_c.so)
Serializer<BinaryWriter, CommitTransactionRef, void>::serialize(BinaryWriter&, CommitTransactionRef&) in serialize.h:106 (libfdb_c.so)
void CommitTransactionRef::serialize<BinaryWriter>(BinaryWriter&) in CommitTransaction.h:151 (libfdb_c.so)
BinaryWriter::WRITER& serializer<BinaryWriter, VectorRef<KeyRangeRef, (VecSerStrategy)0>, VectorRef<KeyRangeRef, (VecSerStrategy)0>, VectorRef<MutationRef, (VecSerStrategy)0>, long>(BinaryWriter&, VectorRef<KeyRangeRef, (VecSerStrategy)0> const&, VectorRef<KeyRangeRef, (VecSerStrategy)0> const&, VectorRef<MutationRef, (VecSerStrategy)0> const&, long const&) in serialize.h:91 (libfdb_c.so)
BinaryWriter::WRITER& serializer<BinaryWriter, VectorRef<KeyRangeRef, (VecSerStrategy)0>, VectorRef<MutationRef, (VecSerStrategy)0>, long>(BinaryWriter&, VectorRef<KeyRangeRef, (VecSerStrategy)0> const&, VectorRef<MutationRef, (VecSerStrategy)0> const&, long const&) in serialize.h:91 (libfdb_c.so)
BinaryWriter::WRITER& serializer<BinaryWriter, VectorRef<MutationRef, (VecSerStrategy)0>, long>(BinaryWriter&, VectorRef<MutationRef, (VecSerStrategy)0> const&, long const&) in serialize.h:91 (libfdb_c.so)
BinaryWriter::WRITER& serializer<BinaryWriter, long>(BinaryWriter&, long const&) in serialize.h:90 (libfdb_c.so)
void save<BinaryWriter, long>(BinaryWriter&, long const&) in serialize.h:113 (libfdb_c.so)
Serializer<BinaryWriter, long, void>::serialize(BinaryWriter&, long&) in serialize.h:146 (libfdb_c.so)
void BinaryWriter::serializeBinaryItem<long>(long const&) in serialize.h:335 (libfdb_c.so)
BinaryWriter::writeBytes(int) in serialize.h:460 (libfdb_c.so)
operator new[](unsigned long, Arena&) in Arena.h:198 (libfdb_c.so)
ArenaBlock::allocate(Reference<ArenaBlock>&, int) in Arena.cpp:110 (libfdb_c.so)
ArenaBlock::create(int, Reference<ArenaBlock>&) in Arena.cpp:182 (libfdb_c.so)

The line numbers are from a local build of commit 0a46c6276efc90ef425666e381d1e118c115854c which is currently tagged 6.2.19.

Here’s the trivial C program I used to get the trace. It just sets a large value over and over in 64 threads. get’ing a tiny key in a single thread leaks memory as well, just slower. And I wanted to collect a trace where the program had eaten >8GiB of memory. :slight_smile:

#define FDB_API_VERSION 610
#include <foundationdb/fdb_c.h>
#include <pthread.h>
#include <stdio.h>
#include <unistd.h>
#include <stdlib.h>

FDBDatabase *database;

void *forever(void *x)
{
  char junk[8192];
  while(1) {
    FDBTransaction *t;
    if(fdb_database_create_transaction(database, &t)) {
      printf("failed to make transaction. whatever.\n");
      continue;
    }
    // generate senseless traffic
    fdb_transaction_set(t, x, 1, junk, 8192);
    FDBFuture *f = fdb_transaction_commit(t);
    fdb_future_block_until_ready(f);
    fdb_future_destroy(f);
    fdb_transaction_destroy(t);
  }
}

int main(int argc, char *argv[])
{
  pthread_t network_thread;
  fdb_select_api_version(610);
  fdb_setup_network();
  // if anyone stumbles across this, this line isn't technically ok
  pthread_create(&network_thread, NULL, fdb_run_network, NULL);
  fdb_create_database(NULL, &database);

  for(int i=0; i<64; i++) {
    pthread_t loop;
    uint8_t *v = malloc(1);
    *v = i;
    pthread_create(&loop, NULL, forever, v);
  }

  while(1)
    sleep(60);
}

Heap track is available at https://github.com/KDE/heaptrack; there’s pacakges for it in Ubuntu 18 at least. The above program will eat memory faster than I can send you my trace, but if anyone wants it, just get my email address from my github profile and hit me up. It is just under 10MiB in size.

FWIW, this was originally spotted in a Go program using that language’s library. I boiled it down to this tiny C thing in hopes that valgrind would have a chance on something that was pure C.

It’s possible that this isn’t a memory leak in the sense of “lost a pointer to a heap allocation and now I can’t free it” which might be why valgrind isn’t finding it. I’m not super familiar with this code but it looks like information about every commit is getting stored in an in-memory data structure that gets flushed by default every 10 seconds. It’s plausible that committing might outpace flushing, and this thing could just be growing unboundedly?

Could be. With the code that caused this problem for me originally, I was able to stop all FDB activity, while keeping the process alive. I never observed the memory usage going back down over timescales of, say, 5 to 30 minutes.

For what it is worth, heaptrack does seem to believe that the memory is leaked. I’ve got no idea how its definition differs from valgrind’s.

Ok, sounds like it’s not just a queue growing unboundedly then

If the memory is being allocated from fast allocator pools, it would never go back down since we never return it. Not sure if that’s the case here or not, though. The stack trace above suggests that particular allocation isn’t coming from fast allocator, in which case it should be freed once we are done with it.