This is an extension of this question here where we noticed high disk IO during a lot of delete operations. We first had a problem with latency of the delete txns which was fixed but we have a problem with throughput.
For context - We have a batch job that clears data older than x number of days and depending on run date it can clear upto 5 - 10% of the data in the cluster ( > 10million keys ). Our cluster size is about 140GB and post delete it comes down to about 132 GB. We run triple replication cluster, SSD-2 across 5 machines. 3 storage servers per machine. Server version 6.2.19
Now the problem sometimes when the delete is happening the DB starts moving significant amount of data. This makes almost all of the storage servers go 60 - 90% disk utilization and eventually rate keeper kicks in. During the peak contention period I see TPSLimit go down as low as “2.142…” and we notice significant delays in opening transactions.
In DataDistributor log I see the AverageShardSize goes from 33953350 to 32671480 and max “InQueue=350”. InProgress goes upto 75. Number of shards comes down from approx 4500 to 4120 during this period.
The whole redistribution takes about 2 hours and then everything becomes normal. We are not setting any knobs for min/max shard size. We are running with spring cleaning turned off ( --knob_spring_cleaning_max_vaccum_pages 0). I can get more information from logs if required.
Does explicitly setting the min (very low number) and max shard size ( about 100MB ?) alleviate this problem ? Also is there a way to slow down this moving data ? The queue depth I see in DD logs are worrying me. Please help.