FDB server or coordinator is down due to out of memory

FDB server or coordinator on my product using the config of foundationdb/packaging/docker/samples/golang at master · apple/foundationdb · GitHub goes down in a few days although the host machine has enough memory. Probably the official sample products will also go down in several days. How can we resolve this problem?

91158d4042b2        foundationdb/foundationdb:6.3.15   "/usr/bin/tini -g --…"    2 days ago          Exited (20) 9 hours ago                            _fdb-server-1_1
c3ed74947f43        foundationdb/foundationdb:6.3.15   "/usr/bin/tini -g --…"    2 days ago          Up 2 days                 0.0.0.0:4500->4500/tcp   _fdb-coordinator_1

Related: Foundationdb 6.2 - fdbserver going out of memory - #8 by tuk

Official sample’s current stats:

CONTAINER ID        NAME                                CPU %               MEM USAGE / LIMIT     MEM %               NET I/O             BLOCK I/O           PIDS
486348df172b        fdbgolangsample_app_1               0.09%               13.94MiB / 38.24GiB   0.04%               12.6MB / 17.6MB     57MB / 0B           8
fa6e3729a803        fdbgolangsample_fdb-server-1_1      5.67%               3.196GiB / 38.24GiB   8.36%               826MB / 850MB       0B / 906MB          9
fd8a17a2195f        fdbgolangsample_fdb-coordinator_1   2.47%               1.826GiB / 38.24GiB   4.78%               869MB / 837MB       41kB / 1.36GB       9

Official sample’s current stats:

CONTAINER ID        NAME                                CPU %               MEM USAGE / LIMIT     MEM %               NET I/O             BLOCK I/O           PIDS
486348df172b        fdbgolangsample_app_1               0.08%               13.95MiB / 38.24GiB   0.04%               28.1MB / 39.2MB     57MB / 0B           8
fa6e3729a803        fdbgolangsample_fdb-server-1_1      5.75%               7.122GiB / 38.24GiB   18.62%              1.84GB / 1.89GB     0B / 2.02GB         9
fd8a17a2195f        fdbgolangsample_fdb-coordinator_1   3.45%               4.057GiB / 38.24GiB   10.61%              1.94GB / 1.87GB     41kB / 3.03GB       9

An official sample was down in 2 days.

CONTAINER ID        IMAGE                              COMMAND                  CREATED             STATUS                      PORTS                    NAMES
486348df172b        fdbgolangsample_app                "/start.bash"            2 days ago          Up 2 days                   0.0.0.0:8080->8080/tcp   fdbgolangsample_app_1
fa6e3729a803        foundationdb/foundationdb:6.3.15   "/usr/bin/tini -g --…"   2 days ago          Exited (20) 8 minutes ago                            fdbgolangsample_fdb-server-1_1
fd8a17a2195f        foundationdb/foundationdb:6.3.15   "/usr/bin/tini -g --…"   2 days ago          Up 2 days                   0.0.0.0:4500->4500/tcp   fdbgolangsample_fdb-coordinator_1

Looks like Go and Python bindings don’t increase memory usage with FDB 6.2, but they increase it with FDB 6.3. So the cause of the problem is probably FDB 6.3.

I don’t have any experience with the golang sample, but I don’t think I’ve observed a leak in other 6.3 usage yet. I’ll try to reproduce with the sample, and if it works we can try to get to the bottom of it.

Thanks. I opened FDB 6.3 goes down with exit code 20 or 139 · Issue #5405 · apple/foundationdb · GitHub because I reproduced the problem also on the official samples.

I haven’t definitively shown this to be the problem, but one thing I noticed about the 6.3 sample is that there is no /var/fdb/logs directory, and yet this is where the server processes are writing their logs. Creating this directory with a running process resulted in it generating all of the files that it had been intending to write.

Meanwhile, when I tried this sample with a 6.2 image, the logs directory was already present.

My theory, then, is that the increased memory usage is a collection of all the trace log events that are being buffered. If so, then this problem is primarily one with how the docker image is setup. Perhaps the server process should also not store an infinite number of trace logs in this case, either.