Smaller memory allocations for stateless pods?

We’re building out a new FoundationDB cluster, and I’m thinking about resource allocations after an initial load test. We’re using the FDB Kubernetes Operator and previously ran a load test that handled 100% of our write traffic for 24 hours (hooray!). We used the default resource allocations of 1 CPU and 8 GiB of memory for all fdbserver containers.

In that load test, we observed:

  1. Storage processes are memory-hungry, and we’d like to throw some more memory at them to see if that yields any measurable improvements
  2. Log processes never seemed to use more than 4 GiB of memory, even under heavy sustained load
  3. Stateless processes never seemed to use more than 0.5 GiB, even under heavy sustained load

I’ve seen a variety of bits of advice here and in the docs. I regret that I don’t have a comprehensive set of citations, but my summary at this point is that 8 GiB is, indeed, the recommended minimum these days. More recently, though, I noticed this bit of advice:

As a general rule, I think setting memory to (1.5 * cache_memory + 4GB) would be a stable configuration.

My question: since stateless processes don’t use cache_memory at all (right?), is it safe to reduce their memory setting to 4 GiB, or does it make more sense to keep them at 8 GiB for reasons that didn’t become evident in our (admittedly short!) load test?

For context, the motivation for reducing the memory allocation for stateless processes would be to make room to increase the allocation for storage (and log) processes. We’re currently using machines with 4 vCPU, 32 GiB of memory, and one SSD. After subtracting a little overhead for background processes/sidecars, I think we can fit three fdbserver processes (we have three uncontested cores and one core that’s divided between those auxiliary processes) and divide ~30 GiB of memory between them. Since there’s just one disk, I think our most common configuration will be one storage process with two stateless processes. The more we can shrink the stateless processes’ memory requierment, the more we can add memory to the storage process.

Thanks!

1 Like

You are correct that stateless processes do not use cache_memory. The memory requirements vary by stateless role and some in-memory structures will increase with cluster size, so for the generic stateless process class it is recommended to keep the limit at 8GiB since they can be used for any stateless roles. In order to go lower for specific processes you would have to use more specific process classes to control which roles run where, however be aware that during cluster recovery the recruitment logic operates on a “best fit” notion, so for example if you configure 6 Commit Proxies but only 3 of your commit proxy class fdbserver processes are alive during recruitment it will choose 3 other processes to run the configured 6 instances of that role.

Thanks! That’s helpful, and I appreciate it!

It sounds like you’ve done a solid initial test and have a good understanding of the memory usage patterns for your FoundationDB cluster. Based on your observations, it’s reasonable to adjust the memory allocation for stateless processes, especially since they don’t rely on cache_memory and typically use very little memory (0.5 GiB under heavy load, as you noted). Reducing their memory to 4 GiB would likely be fine, as long as it doesn’t lead to unexpected performance issues, since your tests didn’t indicate any memory pressure for these processes. The key consideration is ensuring that your storage and log processes are adequately resourced, as these are more memory-intensive. Since you’re working with a setup where you have 32 GiB of memory, it makes sense to allocate more memory to the storage processes while trimming down the stateless ones to maximize the resources available for those critical components. As long as you monitor the system after making these changes and ensure there are no negative impacts on performance, this adjustment should allow for better overall resource utilization without compromising stability.