We’re ingesting a reasonably large amount of data (millions of records, tens of GB) from a Kafka cluster into FoundationDB at the moment, and our container(s) doing the work were gradually increasing RAM usage until K8s killed them for exceeding their limits (it seemed to take ~10 mins to go from the expected 1-1.5GB up to 5GB).
I did a load of testing, and had it down to calls to .set
, .get[Range]
and .clear[Range]
within a transaction. If I opened a transaction and did nothing before committing and closing it, there was no memory leak. As soon as I performed an operation on the DB, memory started leaking. (We were using Database.run
for this, so the commit/close was encapsualted within that method).
Heap dumps of the JVM showed no issues, the heap was sat fine at ~200-300MB, but the resident memory just kept climbing. It certainly looked like a leak in the FDB native bindings somewhere, but I’m always inclined to blame our code over a well-known third party open source system
Separately to this, our log aggregator started rate limiting us because of the sheer level of FoundationDB logs we were pushing. I went and had a look in our code, and spotted that one of our engineers had enabled setServerRequestTracing
on the transaction options, even though we in the infra team were seeing no performance or other problems that would warrant it, so I turned it back off.
As soon as I did so (and redeployed the container), all our our memory leak issues went away. It certainly seems like there is an issue with memory leakage when you turn on server request tracing, even though I thought this didn’t do much on the client side.
Is there something else that is being cached/logged on the client because of this, that I need to manually close/clean up, or is this a valid memory leak I should raise a GitHub issue or similar for?
Our FDB cluster is running 7.1.21, and the client library is 7.1.10. I’m also looking into upgrading the client lib to match the cluster.