Configuration of tolerate more storage servers left behind

leonliao · January 19, 2022, 8:06am

In Administration — FoundationDB 7.1, there is description:

Storage queue size cluster.qos.worst_queue_bytes_storage_server contains the maximum size in bytes of a storage queue. Each storage server has mutations that have not yet been made durable, stored in its storage queue. If this value gets too large, it indicates a storage server is falling behind. A large storage queue will cause the ratekeeper to increase throttling. However, depending on the configuration, the ratekeeper can ignore the worst storage queue from one fault domain. Thus, ratekeeper uses cluster.qos.limiting_queue_bytes_storage_server to determine the throttling level

What does

depending on the configuration, the ratekeeper can ignore the worst storage queue from one fault domain.

mean? What is the configuration? Is it the knob MAX_MACHINES_FALLING_BEHIND? What is the configuration to tolerate more storage servers left behind?

alexmiller · January 20, 2022, 1:25am

Ah, “configuration” is indeed overloaded there. I believe it’s trying to mean the replication configuration.

If you ran configure single on your database, then ratekeeper can’t ignore the worst storage queue, as it’s always the only replica of some piece of data. In any other replication configuration, ratekeeper will already ignore the worst storage queues from one fault domain. There’s no additional changes needed.

If you’re running triple configuration, and one storage server falls behind in a team, then Ratekeeper won’t begin limiting. If in a different storage team, a storage server in the same zone_id starts falling behind, it too will be ignored. If in a third different storage team yet another storage server begins falling behind, but this one is in a different zone_id, then ratekeeper will begin limiting. The rule here specifically is “one zone of worst_storage_queues may be ignored” not “the worst storage server in each team may be ignored”.

Topic		Replies	Views
Knobs/strategies to get around storage server write queue size error? Using FoundationDB	3	947	January 5, 2019
How does Ratekeeper actually _work_, and can I tune it? Running FoundationDB	6	587	October 27, 2022
Daily Pattern in WorstTLog Queue / Tuning TLog 2GB Queue size? Using FoundationDB	6	807	February 5, 2019
FoundationDB Summit 2019: Building a Performant Configuration Store Layer Community	3	753	November 20, 2019
How to troubleshoot throughput performance degrade? Using FoundationDB performance	35	4319	June 20, 2019

Configuration of tolerate more storage servers left behind

Related topics