Knobs to control data relocation?

panghy · September 29, 2018, 1:15am

We use RELOCATION_PARALLELISM_PER_SOURCE_SERVER to control data movement in the past but is that still the right way to do it? Basically, we want to reduce the urgency for the cluster to heal itself when it has one node left because it can cause logs queues to go high enough that cluster throughput suffers.

ashishnm · September 29, 2018, 3:40pm

We use the following 2 knobs on clusters esp. where we are IOPS limited.
knob_relocation_parallelism_per_source_server: 2
knob_fetch_keys_parallelism_bytes: 4000000

panghy · October 4, 2018, 7:51am

yeah, we have those but it doesn’t seem to be able to control log queues (storage queues are ok).

ajbeamon · October 5, 2018, 6:37pm

How large are the log queues growing? They are expected to grow during a failure up to at least 1.5 GB.

panghy · October 5, 2018, 11:28pm

Yeah but that causes enough slowness (ratekeeper) that latencies from reads and writes are noticeable. Reducing the impact of the healing (not as aggressive to the point where there’s little headroom for the cluster to handle a spike in traffic for instance) is what we’re after.

panghy · October 5, 2018, 11:53pm

A simulated failure of a node that basically pegged tlogs to 1.8G or above which means any additional write load could cause latencies to spike. Contrast that with just adding a new node (which results in low-priority moves) and it doesn’t have the same impact to the cluster.

Topic		Replies	Views
Data distribution control, monitor and pause Using FoundationDB	2	807	May 23, 2018
Applying knobs dynamically Running FoundationDB	3	837	September 23, 2020
Seeing 'FinishMoveKeysTooLong' and 'RelocateShardTooLong' while data rebalancing Using FoundationDB	5	512	July 27, 2020
Knobs/strategies to get around storage server write queue size error? Using FoundationDB	3	946	January 5, 2019
Constant Data Movement Using FoundationDB	2	603	January 14, 2019

Knobs to control data relocation?

Related topics