Production optimizations

alexmiller · August 2, 2018, 10:32pm

The 1 log : 8 ssd storage ratio is only roughly what I’ve seen. I’m being very specific about ssd, because the memory storage engine can apply mutations to durable storage faster, so the ratio changes. I think it’s closer to 1:2, but it’s not a thing I’ve benchmarked as often.

Ideally, you’d run a single ssd proxy=5 log=5 1 storage server cluster, run a write heavy benchmark, and look at the trace files from the storage server to determine what rate of mutations your storage server can apply on your hardware. Then do the same with a single ssd proxy=5 log=1 10 storage server cluster, and see what rate of mutations one tlog can support.

These two figures then give you your ratios. Each additional tlog and proxy you add isn’t going to give you quite as much benefit as the last, but your goal is to make sure that you have enough tlogs to be able to feed your storage servers at their full rate, and then give yourself an extra little bit of headroom so that you aren’t running your cluster at 100% and thus getting poor latency.

This is, of course, assuming you’re targeting being able to do full write throughput. If you have a lot of cold data, and care more about commit latency, then running with less proxies or logs than your maximum would be a better option.

I filed Storage Server recruitment should consider existing recruited roles #552 a bit ago, as explicitly assigning process classes is mostly about keeping storage servers away from other latency-critical parts of the system. I don’t really have any ideas on how we would do better auto-configuration of proxies and logs, as changing them invokes a recovery, and a recovery means O(hundreds of milliseconds) write downtime / latency spike.

Topic		Replies	Views
Cluster tuning cookbook Using FoundationDB	26	8838	February 1, 2019
Why doesn't my cluster performance scale when I double the number of machines? Using FoundationDB performance	20	3281	August 17, 2018
Process class and machine sharing deployment questions Using FoundationDB	15	2595	September 3, 2019
Production deployment Using FoundationDB	7	3247	October 12, 2021
Scripts to deploy, benchmark, and tinker with 1M operations/sec FoundationDB cluster on AWS Using FoundationDB	13	4258	November 6, 2018

Production optimizations

Related topics