Deploying storage-only servers

ryanworl · May 8, 2019, 9:20pm

On the Data Engineering Podcast episode I appeared on recently I received a question I hadn’t really considered very hard.

Is it wise to set up servers where all the processes are configured to be storage processes?

The rationale is if a storage process fails there is no need to go through a recovery. The potential downside is you might require more servers than before to achieve the same level of fault tolerance.

Is this a common configuration? I had never heard it before and I assume someone here has at least tried it at some point if it is not obviously flawed.

david.youngworth · May 9, 2019, 8:13pm

In modest to large scale deployments, I think its more efficient to keep special servers separate from storages. An FDB cluster typically scales by adding storage servers, and a small number of TLogs, resolvers, and proxies can service a comparatively large number of storages, so you wouldn’t be saving much hardware.

Besides avoiding recoveries when a storage goes down, you can also optimize your resources by customizing your servers for their role. (more memory for tlog queues, more disk for storages, more CPU for master and cluster controller, etc). Coming from the cloud world, we choose our instance and volume types for each role carefully.

Another concern with having special roles share a process with the storage role is that it could make this storage consistently fall behind the others in its team, since its resources are split.

ryanworl · May 9, 2019, 9:09pm

Thanks! The logic made sense to me, especially given the ratio of storage to everything else, so I’m glad to hear a big user is doing it.

gaurav · May 10, 2019, 3:03am

Good point. Related to the same consideration - I am quite certain that data-distribution will ensure that only a single replica of a given shard is hosted across all SS running on a common host (IP), but I cannot seem to quickly locate the doc or forum-post link to it. Could someone please help me with it?

SteavedHams · May 10, 2019, 6:22pm

You are correct - FoundationDB attempts to choose storage teams such that storage servers on the same physical hardware will not hold any of the same shards. One mechanism employed to achieve this is that all processes on the same host (and OS instance) will have the same “machine ID”. However, virtual machines or containers running on the same physical machine might interfere with this mechanism so the machine ID should be set explicitly for each process. See here for more details.

I actually cannot find anything in our documentation that explains exactly this feature, but the link above includes the general statement “FoundationDB takes pains to assure that data replication is protected from hardware-correlated failures” which does cover the case you are asking about.

Topic		Replies	Views
WARNING A single process is both a transaction log and a storage server Using FoundationDB	2	1906	January 31, 2023
Production Hardware Using FoundationDB	1	1015	November 26, 2018
Multiple servers on a single machine Using FoundationDB	1	937	March 3, 2021
WARNING: A single process is both a transaction log and a storage server Using FoundationDB	16	1766	August 13, 2019
Storage servers restarting Using FoundationDB	0	278	January 19, 2023

Deploying storage-only servers

Related topics