FoundationDB 7.1.24 - the memory usage after clean startup of fdbserver process is too high

We have a cluster of 7 nodes, with a total of 56 fdbserver processes, storing over 50 billion entries. After a clean reboot of an fdbserver process, the memory usage exceeds 17GB. If we set the memory limit in the foundationdb.conf file to below 17GB, the fdbserver processes cannot successfully boot.

We are unsure why so much memory is needed, and if we were to deploy the cluster to store 100 billion data entries, how much memory would be required?

Any help would be much appreciated.

What storage engine are you using? If it’s memory it will take up a lot memory.

Thanks for your reply. The storage engine is ssd-2, fdbcli --exec "status;" gives us:

$ fdbcli
Using cluster file `fdb.cluster'.

The database is available.

Welcome to the fdbcli. For help, type `help'.
fdb> status

Using cluster file `fdb.cluster'.

Configuration:
  Redundancy mode        - triple
  Storage engine         - ssd-2
  Coordinators           - 7
  Usable Regions         - 1

Cluster:
  FoundationDB processes - 56
  Zones                  - 7
  Machines               - 7
  Memory availability    - 32.0 GB per process on machine with least available
  Retransmissions rate   - 1 Hz
  Fault Tolerance        - 2 machines
  Server time            - 03/22/23 10:17:44

Data:
  Replication health     - Healthy
  Moving data            - 0.000 GB
  Sum of key-value sizes - 49.688 TB
  Disk space used        - 197.873 TB

Operating space:
  Storage server         - 3321.1 GB free on most full server
  Log server             - 3321.9 GB free on most full server

Workload:
  Read rate              - 345451 Hz
  Write rate             - 0 Hz
  Transactions started   - 337321 Hz
  Transactions committed - 0 Hz
  Conflict rate          - 0 Hz

Backup and DR:
  Running backups        - 0
  Running DRs            - 0

Client time: 03/22/23 10:17:44

Not sure if this helps, but …

You have 7 machines, holding 56 processes. I am assuming this is 8 per machine. What is the configuration for [fdbserver] section?
https://apple.github.io/foundationdb/configuration.html#fdbserver-section
Particularly memory=. If it is defaulting to 8 GB, as far as I know, you need 64 GB max per machine.

In addition, your earlier message sounded as is the memory usage goes back up immediately after reboot (even if there is no activity on it). Your status output indicates a fairly good amount of traffic. Is it really going that high as soon as db is up or after getting enough traffic?

Clarifying these might help someone answer the question.

I’m not familiar with the tool you are using but “Memory” of 17gb and “Memory Used” of 52% would seem to indicate a 17GB virtual memory footprint and about half of that is actually Resident. Is that correct?

Virtual Memory will often be a factor of 2x or 3x larger than Resident memory, but Resident is what matters. In FDB 7.1, the malloc implementation was changed from glibc to jemalloc, which has a much larger virtual : resident ratio.

There are two options to fdbserver which limit memory, --memory sets a resident limit and --memory-vsize sets the virtual memory limit.

Thanks for your questions!

Yes, we have 8 fdbserver processes on each machine, and each process manages 1 NVMe disk.

The [fdbserver] section is:

## Default parameters for individual fdbserver processes
[fdbserver]
command = /usr/sbin/fdbserver
public-address = auto:$ID
listen-address = public
datadir = /foundationdb/data/$ID
logdir = /var/log/foundationdb
logsize = 200MiB
maxlogssize = 8MiB
# machine-id =
# datacenter-id =
# class =
memory = 32GiB
# storage-memory = 1GiB
cache-memory = 8GiB
# metrics-cluster =
# metrics-prefix =

Each machine in our cluster is equipped with 512GB memory.

The memory usage increases significantly as soon as the database (db) is up, and during the booting time, the CPU usage of each fdbserver process reaches approximately 100%. Once the cluster is back to a healthy state, there is no traffic from outside clients or internal traffic between fdbserver processes.

Thanks for your reply.

The tool reports resident memory usage, and /proc also gives that the resident memory usage of fdbserver is above 17GB. In one node, execute the following command, we get resident and virtual memory usage in KiB bytes unit (the first column is resident memory and the second column is virtual memory):

$ pidof fdbserver | sed -e 's: : -p :g' | xargs ps -o rss= -o vsz= -p
16868952 17029776
16898596 17085836
16866468 17006220
16779736 17007628
16819000 17029008
16829896 16999052
16802448 16991244
16746752 16969484

and, the current fdb details status is:

>>> status details
Using cluster file `/etc/foundationdb/fdb.cluster'.

Configuration:
  Redundancy mode        - triple
  Storage engine         - ssd-2
  Coordinators           - 7
  Usable Regions         - 1

Cluster:
  FoundationDB processes - 56
  Zones                  - 7
  Machines               - 7
  Memory availability    - 32.0 GB per process on machine with least available
  Retransmissions rate   - 1 Hz
  Fault Tolerance        - 2 machines
  Server time            - 03/24/23 11:55:00

Data:
  Replication health     - Healthy
  Moving data            - 0.000 GB
  Sum of key-value sizes - 49.702 TB
  Disk space used        - 196.752 TB

Operating space:
  Storage server         - 3352.1 GB free on most full server
  Log server             - 3343.0 GB free on most full server

Workload:
  Read rate              - 29 Hz
  Write rate             - 1 Hz
  Transactions started   - 5 Hz
  Transactions committed - 0 Hz
  Conflict rate          - 0 Hz

Backup and DR:
  Running backups        - 0
  Running DRs            - 0

Process performance details:
  10.142.7.74:4500       (  1% cpu;  0% machine; 0.002 Gbps;  1% disk IO;17.7 GB / 32.0 GB RAM  )
  10.142.7.74:4501       (  1% cpu;  0% machine; 0.002 Gbps;  1% disk IO;17.7 GB / 32.0 GB RAM  )
  10.142.7.74:4502       (  1% cpu;  0% machine; 0.002 Gbps;  1% disk IO;17.7 GB / 32.0 GB RAM  )
  10.142.7.74:4503       (  1% cpu;  0% machine; 0.002 Gbps;  1% disk IO;17.6 GB / 32.0 GB RAM  )
  10.142.7.74:4504       (  1% cpu;  0% machine; 0.002 Gbps;  1% disk IO;17.7 GB / 32.0 GB RAM  )
  10.142.7.74:4505       (  1% cpu;  0% machine; 0.002 Gbps;  1% disk IO;17.7 GB / 32.0 GB RAM  )
  10.142.7.74:4506       (  1% cpu;  0% machine; 0.002 Gbps;  1% disk IO;17.7 GB / 32.0 GB RAM  )
  10.142.7.74:4507       (  1% cpu;  0% machine; 0.002 Gbps;  1% disk IO;17.7 GB / 32.0 GB RAM  )
  10.142.7.85:4500       (  1% cpu;  0% machine; 0.002 Gbps;  1% disk IO;17.5 GB / 32.0 GB RAM  )
  10.142.7.85:4501       (  1% cpu;  0% machine; 0.002 Gbps;  1% disk IO;17.6 GB / 32.0 GB RAM  )
  10.142.7.85:4502       (  1% cpu;  0% machine; 0.002 Gbps;  1% disk IO;17.5 GB / 32.0 GB RAM  )
  10.142.7.85:4503       (  1% cpu;  0% machine; 0.002 Gbps;  1% disk IO;17.5 GB / 32.0 GB RAM  )
  10.142.7.85:4504       (  1% cpu;  0% machine; 0.002 Gbps;  2% disk IO;17.5 GB / 32.0 GB RAM  )
  10.142.7.85:4505       (  1% cpu;  0% machine; 0.002 Gbps;  1% disk IO;17.5 GB / 32.0 GB RAM  )
  10.142.7.85:4506       (  1% cpu;  0% machine; 0.002 Gbps;  1% disk IO;17.5 GB / 32.0 GB RAM  )
  10.142.7.85:4507       (  1% cpu;  0% machine; 0.002 Gbps;  1% disk IO;17.4 GB / 32.0 GB RAM  )
  10.142.7.86:4500       (  1% cpu;  0% machine; 0.003 Gbps;  1% disk IO;16.2 GB / 32.0 GB RAM  )
  10.142.7.86:4501       (  1% cpu;  0% machine; 0.003 Gbps;  1% disk IO;16.2 GB / 32.0 GB RAM  )
  10.142.7.86:4502       (  1% cpu;  0% machine; 0.003 Gbps;  1% disk IO;16.3 GB / 32.0 GB RAM  )
  10.142.7.86:4503       (  1% cpu;  0% machine; 0.003 Gbps;  1% disk IO;16.1 GB / 32.0 GB RAM  )
  10.142.7.86:4504       (  1% cpu;  0% machine; 0.003 Gbps;  2% disk IO;16.2 GB / 32.0 GB RAM  )
  10.142.7.86:4505       (  1% cpu;  0% machine; 0.003 Gbps;  1% disk IO;16.1 GB / 32.0 GB RAM  )
  10.142.7.86:4506       (  1% cpu;  0% machine; 0.003 Gbps;  1% disk IO;16.1 GB / 32.0 GB RAM  )
  10.142.7.86:4507       (  1% cpu;  0% machine; 0.003 Gbps;  1% disk IO;16.2 GB / 32.0 GB RAM  )
  10.142.7.88:4500       (  1% cpu;  0% machine; 0.004 Gbps;  2% disk IO;16.2 GB / 32.0 GB RAM  )
  10.142.7.88:4501       (  1% cpu;  0% machine; 0.004 Gbps;  1% disk IO;16.1 GB / 32.0 GB RAM  )
  10.142.7.88:4502       (  1% cpu;  0% machine; 0.004 Gbps;  1% disk IO;16.1 GB / 32.0 GB RAM  )
  10.142.7.88:4503       (  1% cpu;  0% machine; 0.004 Gbps;  1% disk IO;16.7 GB / 32.0 GB RAM  )
  10.142.7.88:4504       (  1% cpu;  0% machine; 0.004 Gbps;  1% disk IO;16.1 GB / 32.0 GB RAM  )
  10.142.7.88:4505       (  1% cpu;  0% machine; 0.004 Gbps;  1% disk IO;16.2 GB / 32.0 GB RAM  )
  10.142.7.88:4506       (  1% cpu;  0% machine; 0.004 Gbps;  1% disk IO;16.1 GB / 32.0 GB RAM  )
  10.142.7.88:4507       (  3% cpu;  0% machine; 0.004 Gbps;  1% disk IO;16.1 GB / 32.0 GB RAM  )
  10.142.7.89:4500       (  1% cpu;  0% machine; 0.002 Gbps;  1% disk IO;16.1 GB / 32.0 GB RAM  )
  10.142.7.89:4501       (  1% cpu;  0% machine; 0.002 Gbps;  1% disk IO;16.1 GB / 32.0 GB RAM  )
  10.142.7.89:4502       (  1% cpu;  0% machine; 0.002 Gbps;  1% disk IO;16.1 GB / 32.0 GB RAM  )
  10.142.7.89:4503       (  1% cpu;  0% machine; 0.002 Gbps;  1% disk IO;16.1 GB / 32.0 GB RAM  )
  10.142.7.89:4504       (  1% cpu;  0% machine; 0.002 Gbps;  1% disk IO;16.1 GB / 32.0 GB RAM  )
  10.142.7.89:4505       (  1% cpu;  0% machine; 0.002 Gbps;  1% disk IO;16.1 GB / 32.0 GB RAM  )
  10.142.7.89:4506       (  1% cpu;  0% machine; 0.002 Gbps;  2% disk IO;16.1 GB / 32.0 GB RAM  )
  10.142.7.89:4507       (  1% cpu;  0% machine; 0.002 Gbps;  1% disk IO;16.1 GB / 32.0 GB RAM  )
  10.142.7.90:4500       (  1% cpu;  0% machine; 0.002 Gbps;  1% disk IO;16.1 GB / 32.0 GB RAM  )
  10.142.7.90:4501       (  1% cpu;  0% machine; 0.002 Gbps;  1% disk IO;16.1 GB / 32.0 GB RAM  )
  10.142.7.90:4502       (  1% cpu;  0% machine; 0.002 Gbps;  2% disk IO;16.1 GB / 32.0 GB RAM  )
  10.142.7.90:4503       (  1% cpu;  0% machine; 0.002 Gbps;  1% disk IO;16.1 GB / 32.0 GB RAM  )
  10.142.7.90:4504       (  1% cpu;  0% machine; 0.002 Gbps;  1% disk IO;16.1 GB / 32.0 GB RAM  )
  10.142.7.90:4505       (  1% cpu;  0% machine; 0.002 Gbps;  1% disk IO;16.1 GB / 32.0 GB RAM  )
  10.142.7.90:4506       (  1% cpu;  0% machine; 0.002 Gbps;  1% disk IO;16.1 GB / 32.0 GB RAM  )
  10.142.7.90:4507       (  1% cpu;  0% machine; 0.002 Gbps;  1% disk IO;16.1 GB / 32.0 GB RAM  )
  10.142.7.91:4500       (  1% cpu;  0% machine; 0.001 Gbps;  1% disk IO;16.1 GB / 32.0 GB RAM  )
  10.142.7.91:4501       (  1% cpu;  0% machine; 0.001 Gbps;  1% disk IO;16.1 GB / 32.0 GB RAM  )
  10.142.7.91:4502       (  1% cpu;  0% machine; 0.001 Gbps;  1% disk IO;16.1 GB / 32.0 GB RAM  )
  10.142.7.91:4503       (  1% cpu;  0% machine; 0.001 Gbps;  1% disk IO;16.1 GB / 32.0 GB RAM  )
  10.142.7.91:4504       (  0% cpu;  0% machine; 0.001 Gbps;  0% disk IO;16.1 GB / 32.0 GB RAM  )
  10.142.7.91:4505       (  1% cpu;  0% machine; 0.001 Gbps;  1% disk IO;16.1 GB / 32.0 GB RAM  )
  10.142.7.91:4506       (  1% cpu;  0% machine; 0.001 Gbps;  1% disk IO;16.1 GB / 32.0 GB RAM  )
  10.142.7.91:4507       (  1% cpu;  0% machine; 0.001 Gbps;  1% disk IO;16.1 GB / 32.0 GB RAM  )

Coordination servers:
  10.142.7.74:4500  (reachable)
  10.142.7.85:4500  (reachable)
  10.142.7.86:4500  (reachable)
  10.142.7.88:4500  (reachable)
  10.142.7.89:4500  (reachable)
  10.142.7.90:4500  (reachable)
  10.142.7.91:4500  (reachable)

Client time: 03/24/23 11:55:00

We used --memory option of foundationdb.conf to limit resident memory.

1 Like

Thanks for confirming that the 17 GB is RSS.

I see the problem. You have 49 TB of KV data and 56 processes. After triple replication, this is 49 * 3 / 56 = 2.6 TB of KV data that each process is responsible for.

There is a data structure that storage servers have called the Byte Sample which stores a deterministic random sample of keys. This data is persisted on disk in the storage engine and is loaded immediately upon storage server startup. Unfortunately, its size is not tracked or reported, but grows linearly with KV size and I suspect yours is somewhere around 4GB-6GB based on the memory usage I’ve seen for smaller storage KV sizes.

The byte sample size is technically configurable to a smaller sample rate, however changing its knob once a cluster is created is undefined behavior. FDB relies on the byte sample’s determinism to know how much logical data is in each shard and each storage server. Weird things will happen if you change this knob on a existing cluster and the cluster may become unavailable. I don’t think any data loss would occur but you could easily get into a situation that is hard to get out of.

If you need to reduce memory usage for you disk sizes you could reduce the cache memory setting.

If you want to reduce the size of the byte sample, you would have to create a new cluster and migrate your data to it. You would also have to make sure that no storage servers on the new cluster ever start up without the knob override. The option is knob_byte_sampling_factor and the default is 250. Multiply this by N to reduce the byte sample size by a factor of N.

4 Likes

Thank you for providing that valuable information. We’ll try it in our future benchmarking.

And, we are curious whether the data structure Byte Sample is a global data structure that each fdbserver process keeps one replica of, or if it is a local data structure that only contains random sample keys from its managed key-value pairs.

Byte Sample is a local data structure that each storage server maintains, used for estimating shard sizes on the storage server.

1 Like