Memory leak in C client-lib - when a timeout() is set on transaction options?

Hi @alexmiller @ajbeamon, while trying to estimate the memory used by the FDB-C client for our java based layer, I looked at MemoryMetrics traces by enabling tracing the client-side.

It appears that on setting a large timeout on the transaction, the memory used by the transaction is not freed for the entire duration of the timeout, even though the transaction itself is completed and closed.

I am still not sure if there is some error in my methodology or measurement;

Below a trivial code snippet that reproduces the problem: it creates one million KV entries, and then in the next clean run reads each of these entries serially.

If no explicit timeout is set on the transaction, then MemoryMetrics indicated bytes stabilizes at ~6MB for TotalMemory64 for the lifetime of this process; however, if it is set to a large value, say 10 mins, then the counter TotalMemory64 keeps growing till about 150MB (which is I suppose my total data size).

I have tried this on a Mac for now, but I can try this on Linux if that could create a difference.
FDB version used: v6.1.8

Also attaching the trace file that indicates a memory build-up, and one that seems to be clean:

    public static void read(final Database db) {
        final long TEN_MIN = 10L * 60 * 1000;
        for (int i = 0; i < 1_000_000; i++) {
            final Tuple key = Tuple.from(1, 1, "a" + i);
            db.run(tx -> {
                // ****This option seems to create memory build up in C client ****
                tx.options().setTimeout(TEN_MIN);
               // **************************************************************
                return tx.get(key.pack()).join();
            });
            if (i % 10_000 == 0) {
                System.out.println("read " + i + " rows");
            }
        }
        System.out.println("-- done reading --");
    }

    public static void write(final Database db) {
        final byte[] val = "CiwKB09jZWFuaWESCUF1c3RyYWxpYRoCQVUiACoAMgA6AEIASgJlblIAWgBiAA==".getBytes();
        final List<CompletableFuture<Void>> tasks = new ArrayList<>();
        final Semaphore permits = new Semaphore(100);
        for (int i = 0; i < 1_000_000; i++) {
            final Tuple key = Tuple.from(1, 1, "a" + i);
            permits.acquireUninterruptibly();
            tasks.add(db.runAsync(tx -> {
                tx.set(key.pack(), val);
                return CompletableFuture.<Void>completedFuture(null);
            }).whenComplete((v, t) -> permits.release()));
        }
        AsyncUtil.whenAll(tasks).join();
        System.out.println("-- all writes done --");
    }

    public static void main(String[] args) {
        final com.apple.foundationdb.FDB fdb = com.apple.foundationdb.FDB.selectAPIVersion(610);
        fdb.options().setTraceEnable("/tmp/");
        fdb.options().setTraceMaxLogsSize(4194304);
        fdb.options().setTraceRollSize(1048576);
        final Database db= fdb.open("/usr/local/etc/foundationdb/fdb.cluster",  Executors.newFixedThreadPool(10));

        // write(db);  // In first run, uncomment this to populate the data 
        // read(db);  // in Second run onwards, comment write() call and uncomment this call.
        System.exit(0);
    }

Interesting, I’ll try to reproduce and then identify what’s going on.

Some more info that may be useful:

  • Memory build-up seems independent of actual key values fetched. It looks like some fixed overhead with every transaction. The issue is much lesser in case of get_range() to retrieve the same number of rows (due to fewer transactions); memory build increases with smaller limit per get_range() due to more transactions being done to get the same number of rows.
  • Even if the keys are not present in FDB, the same pattern is observed - indicating that the actual data rows are not being held in memory.
  • This memory build up is much lower than what we would have if I block freeing up of transaction resources (and associated native Futures) all together - I simulated this by blocking all threads that would get the value from C library and do a Future_dispose (i.e. blocking all threads in executor in below code snippet):
com.apple.foundationdb.NativeFuture.java:

protected void registerMarshalCallback(Executor executor) {
	if(cPtr != 0) {
		Future_registerCallback(cPtr, () -> executor.execute(this::marshalWhenDone));
	}
} 

This issue is quite severe for us as we are running into a problem where some of our processes are being killed by OS due to exceeding memory reservations when doing a lot of read transactions over small durations.

That seems consistent with what I’ve observed of this as well. I think the memory is being held in task queues, but so far the backtraces I’ve gotten have pointed somewhere a bit different than I expected. The place I had initially thought was that the timers were living in the queue until they elapsed (which I think is still true), but the allocation backtraces are curiously pointing to an actor related to the destruction of the transaction. I haven’t gotten to the bottom of it yet, so I’ll keep looking.

1 Like

Thanks for looking into this issue!

I looked into this some more, and it seems there’s something about the workload pattern here that resulted in the memory allocation tracking results being a bit confusing. However, I’m fairly sure now that the memory I see accumulating in this test is what I initially suspected, which is that the timer tasks being setup for the timeout are holding memory (128 bytes worth in the 64-byte fast allocated pools, plus other memory that’s not going to be tracked in the pools) for the entire duration of their timeout regardless of whether the transaction has been destroyed.

I created an issue to raise this concern with the group: https://github.com/apple/foundationdb/issues/2335

Thank you for spending time on this. In the meantime, we have made a few changes to use shorter timeout on transactions, and have layer-level timeout/retry management for supporting larger timeouts.