We are running a three-node FoundationDB cluster as the key-value backend in our production environment.
On each machine, we have deployed one storage process, one transaction process, and two stateless processes.
The redundancy mode is set to double, and each machine has 4 cores and 16 GB RAM.
We are using FoundationDB version 7.3.6, and the storage engine is ssd-2.
The cluster was started on December 26, 2025. Around February 5, 2026, we observed that all storage processes restarted, most likely due to an out-of-memory (OOM) condition.
All configurations are using default values — memory is set to 8 GB, and cache-memory to 2 GB.
From system-level monitoring, we saw that memory usage per machine started at around 2 GB, then gradually and steadily increased to about 10 GB over the course of approximately 30 days.
Shortly after reaching that peak, memory usage dropped sharply back to 2 GB, which we believe corresponds to the moment the storage processes self-killed and restarted.
We have confirmed that the memory growth was driven by the storage processes, but we’re still trying to understand why the memory usage increased in such a pattern over time.
We would greatly appreciate any insights, suggestions, or similar experiences from the community!
FDB uses a custom memory allocator FastAllocator (see Memory Considerations · apple/foundationdb Wiki · GitHub), which reports different block’s memory usage in MemoryMetrics events. You might want to graph TotalMemory field over time to see if FastAllocator is gradually using more memory. If so, the behavior is expected, because FastAllocator does not return free pages back to the OS (for memory usage efficiency). Then I’d recommend increase memory size of SS roles. Note not all memory is consumed by FastAllocator.
From the TotalMemory values in the production environment logs, I observed that Fastalloc only uses about 2GB of memory, even though the overall memory usage of storage is 6.7GB. As the process memory increases, TotalMemory does not show significant changes.
In the test environment, I used jeprof to observe jemalloc’s allocation. When the process memory increased, neither jemalloc nor Fastalloc showed any changes in memory usage.
Regarding HugeArenaSample, I observed in the logs that its maximum size was only 31KB. Additionally, no GetMagazineSample output was found in the logs.
With the help of AI, I analyzed the code and speculated that the SQLite B-Tree might be using a large amount of memory, as I noticed there is no limit on the memory usage of the B-Tree. Is this possible?
I will make further efforts to resolve the issue.
Thank you again for your response! If you have any suggestions or questions, I would appreciate it if you could point them out.