Storage process slowly growth until oom

walter.Liang · March 3, 2026, 3:12pm

Hello everyone,

We are running a three-node FoundationDB cluster as the key-value backend in our production environment.
On each machine, we have deployed one storage process, one transaction process, and two stateless processes.
The redundancy mode is set to double, and each machine has 4 cores and 16 GB RAM.
We are using FoundationDB version 7.3.6, and the storage engine is ssd-2.

The cluster was started on December 26, 2025. Around February 5, 2026, we observed that all storage processes restarted, most likely due to an out-of-memory (OOM) condition.
All configurations are using default values — memory is set to 8 GB, and cache-memory to 2 GB.

From system-level monitoring, we saw that memory usage per machine started at around 2 GB, then gradually and steadily increased to about 10 GB over the course of approximately 30 days.
Shortly after reaching that peak, memory usage dropped sharply back to 2 GB, which we believe corresponds to the moment the storage processes self-killed and restarted.

We have confirmed that the memory growth was driven by the storage processes, but we’re still trying to understand why the memory usage increased in such a pattern over time.

We would greatly appreciate any insights, suggestions, or similar experiences from the community!

Thanks a lot in advance!

system monitor

walter.Liang · March 3, 2026, 3:16pm

2026-2-28 storage process trace log

jzhou · March 8, 2026, 11:57pm

FDB uses a custom memory allocator FastAllocator (see Memory Considerations · apple/foundationdb Wiki · GitHub), which reports different block’s memory usage in MemoryMetrics events. You might want to graph TotalMemory field over time to see if FastAllocator is gradually using more memory. If so, the behavior is expected, because FastAllocator does not return free pages back to the OS (for memory usage efficiency). Then I’d recommend increase memory size of SS roles. Note not all memory is consumed by FastAllocator.

There are other possibilities. For instance, we found disable transparent huge page is needed when using RHEL9 fdbserver 7.x on sqlite OOM on RHEL9 · apple/foundationdb Wiki · GitHub

Debug Out Of Memory (OOM) Errors in Simulation and Production · apple/foundationdb Wiki · GitHub mentioned two more trace events to check: GetMagazineSample and HugeArenaSample.

walter.Liang · March 12, 2026, 6:49am

Thank you very much for your reply!

From the TotalMemory values in the production environment logs, I observed that Fastalloc only uses about 2GB of memory, even though the overall memory usage of storage is 6.7GB. As the process memory increases, TotalMemory does not show significant changes.
In the test environment, I used jeprof to observe jemalloc’s allocation. When the process memory increased, neither jemalloc nor Fastalloc showed any changes in memory usage.
Regarding HugeArenaSample, I observed in the logs that its maximum size was only 31KB. Additionally, no GetMagazineSample output was found in the logs.
With the help of AI, I analyzed the code and speculated that the SQLite B-Tree might be using a large amount of memory, as I noticed there is no limit on the memory usage of the B-Tree. Is this possible?
I will make further efforts to resolve the issue.

Thank you again for your response! If you have any suggestions or questions, I would appreciate it if you could point them out.

Topic		Replies	Views
FDB out of memory Running FoundationDB	5	1113	July 22, 2022
Are there any 'likely problems' resulting in a storage role exceeding its 8GB RAM and being killed that folks can point me to? Using FoundationDB	1	189	February 13, 2025
ERROR: Out of memory with Foundationdb 7.0.0 Using FoundationDB	9	1493	August 31, 2023
Storage server Out of memory error Using FoundationDB	1	502	October 13, 2021
Foundationdb 6.2 - fdbserver going out of memory Using FoundationDB	9	1194	April 23, 2020

Storage process slowly growth until oom

Related topics