Scalability performance benchmark

There is a user with over 1PB in a single cluster for an analytics workload that I am aware of. I wouldn’t necessarily say that is a good idea, though. You can run more than one cluster. :grinning:

If you’re expecting many TB of data and have a clean way to partition it such that the failures can not impact data across partitions, run multiple clusters. This isn’t an FDB specific thing.

For performance specifically, there are a few examples in the documentation that show scalability across multiple cores and machines as well as how an individual process handles different read/write patterns. These examples match up with my experience mostly.

One thing not mentioned in the documentation is you need to understand your workload relative to how many storage processes you use per disk. If you need to store a lot of data that will mostly be cold, you can get away with fewer processes per disk than if you have a workload with a high write rate. Fewer storage processes mean less work doing failure detection and other things like that which limit how large a cluster can be from my understanding.

1 Like