How to increase "storage server MVCC memory"?


I run a fdb 6.1.8 cluster and got this from fdbcli status

  Performance limited by process: Storage server MVCC memory.
  Most limiting process:

However, memory usage of that process is quite low, so it’s not due to increase “memory = X GiB” in foundationdb.conf     ( 47% cpu; 31% machine; 9.110 Gbps; 13% disk IO; 2.5 GB / 11.5 GB RAM  )

I wonder how to increase that MVCC memory? Or what to do when encounter this issue.

== UPDATE ==

From the source code it looks like I could increase MAX_TRANSACTIONS_PER_BYTE, default value is 1000. I’m not quite sure what it means though. Max bytes per transactions? But size of our (key, value) is greater than 7K, so one FDB transaction split into multiple smaller transactions internally?

Commonly this is what storage server saturation shows up as, so the answer here might be to run more storage servers instead of increasing the memory limit. Queues generally fill up when input rate > output rate, or in this case, rate of mutations committed is higher than what the storage server can write to disk. Sharding your writes across more storage servers would then be a better solution. (Though the in-memory MVCC structure isn’t exactly the most memory-efficient, so you could probably design a workload where MVCC memory is more limiting than disk IOPS.)

To answer the actual question though: to increase the memory limit, I think the knobs that you’d want to tweak are TARGET_BYTES_PER_STORAGE_SERVER, which defaults to 1GiB, and STORAGE_HARD_LIMIT_BYTES, which defaults to 1.5GiB. If you increase them, you should strongly consider increasing the memory given to the process (--memory=) by the same amount.

But, as always, non-standard knob settings have been less tested, so proceed with caution.


I’ll just add to this a concrete definition of what it means to be performance limited by MVCC memory, at least in all versions up through 6.1:

Ratekeeper will not let you start more transactions than would fill the storage server queues in 7 seconds (assuming default knobs). Basically, I think it’s trying to enforce that the queue doesn’t get full simply holding the mutations required to maintain the MVCC window (which is 5 seconds). Status will report that you are performance limited and give a reason if you are running at more than 80% of the transaction rate being enforced by ratekeeper.

So in other words, to see this message you must be inserting data at a rate >80% of what ratekeeper thinks would fill up the storage server queues in 7 seconds.


Also, I should mention that while you might see this from time to time intermittently, it’s not something I’ve generally seen as being the limiter in a sustained load. Usually, you would saturate the disk (or possibly CPU) of the storage server at a rate lower than what would fill the queue in the MVCC window. In that case, you’d see a different message (such as “Storage server performance (storage queue)”). Or, you may find that this message appears sporadically, which can occur when your load is spiky. In that case, it’s kind of acting as a cap on your bursts.

If you were running into this limit for the duration of a sustained load, though, then I guess the only way to fully take advantage of your fast hardware would be to increase the queue size to accommodate the MVCC window.