We use RELOCATION_PARALLELISM_PER_SOURCE_SERVER to control data movement in the past but is that still the right way to do it? Basically, we want to reduce the urgency for the cluster to heal itself when it has one node left because it can cause logs queues to go high enough that cluster throughput suffers.
1 Like
We use the following 2 knobs on clusters esp. where we are IOPS limited.
knob_relocation_parallelism_per_source_server: 2
knob_fetch_keys_parallelism_bytes: 4000000
yeah, we have those but it doesn’t seem to be able to control log queues (storage queues are ok).
How large are the log queues growing? They are expected to grow during a failure up to at least 1.5 GB.
Yeah but that causes enough slowness (ratekeeper) that latencies from reads and writes are noticeable. Reducing the impact of the healing (not as aggressive to the point where there’s little headroom for the cluster to handle a spike in traffic for instance) is what we’re after.