We recently upgraded from FoundationDB 5.2.x (custom build by @panghy) to the public release 6.2.7
one of our clusters has been stuck moving data and in repartioning for days now, the disks are showing erratic data usage and I have been digging through the trace logs to figure out why it’s near constantly moving 100Gs of data but am not able to ascertain why.
cluster is healthy but constantly moving data (no errors or messages reported)
you can see here that the constant line on the left is v. 5.2.x where even under load we are not moving much data, tens to hundreds of megabytes, once the upgrade is completed we’re near constantly moving hundreds of gigabytes.
Disk usage drastically changed around the time of the upgrade as well.
This is on an internal test cluster, we have pulled all load off of the cluster now to observe how the cluster behaves, and while the disk usage has calmed quite a bit we are still seeing a lot of data movement that doesn’t seem to have any end in sight.
I have poked through the trace logs across the machines but haven’t found any smoking guns yet. What other things should I be looking for here.
one additional strange note, if i re-run the command
configure double ssd the configuration is always changed and the cluster goes back into reinit.