Unlimited cloud servers to the rescue!
Data distribution speed has been a tricky topic. It’s been tuned down over time as data distribution was seen impacting client latencies. For example, see How to speed up balancing? as a report of data distribution running causing degraded throughput. However, as you’ve noticed, there’s times when you’d prefer data distribution to run as quickly as possible.
So the root issue here is that we have data distribution set to a static rate, and we need to turn it into a dynamic rate based on how much idle disk/cpu time the cluster has, and what the priority of the data distribution action is. Smoothing out a 5% data imbalance should be throttled down in the face of user traffic, but moving half of a full storage server to an empty one should be able to use all the resources of a storage server if ratekeeper has already blocked all client traffic to the cluster.
#1046 is the issue to follow for this. @fzhjon wrote Introduce priority to fetchKeys requests from data distribution #1791, which is the first step in this direction. I’m actually unsure how much more work is scheduled to happen on this before 7.0… @fzhjon?