Large excludes failing repeatedly

2021-02-11T11:19:48.082Z	INFO	controller	Running command	{"namespace": "timeseries", "cluster": "foundationdb-cluster-1", "path": "/usr/bin/fdb/6.2/fdbcli", "args": ["/usr/bin/fdb/6.2/fdbcli", "--exec", "exclude 10.0.254.134 10.244.62.11 10.0.79.183 10.244.69.12 10.0.54.48 10.244.111.14 10.244.220.13 10.244.175.8 10.244.92.11 10.244.173.9 10.244.101.8 10.244.60.12 10.0.135.254 10.244.83.9 10.0.208.30 10.244.74.9 10.0.29.165 10.244.77.13 10.0.127.206 10.244.227.10 10.0.144.250 10.244.228.10 10.0.215.51 10.244.88.9 10.0.210.156 10.244.75.11 10.0.87.196 10.244.106.13 10.0.213.53 10.244.63.10 10.0.85.224 10.244.70.12 10.0.16.24 10.244.183.12 10.0.38.249 10.244.100.12 10.0.119.184 10.244.98.10 10.0.22.166 10.244.22.15 10.0.89.9 10.244.110.12 10.244.216.12 10.244.93.13 10.244.52.13 10.0.71.227 10.244.209.12 10.0.35.98 10.244.90.12 10.0.131.63 10.244.105.14 10.0.26.28 10.244.109.13 10.0.129.20 10.244.56.12 10.0.244.3 10.244.58.11 10.0.191.229 10.244.102.12 10.0.189.192 10.244.48.13 10.0.227.17 10.244.176.10 10.244.79.15 10.244.64.10 10.244.81.12 10.244.230.11 10.244.50.10 10.244.221.11 10.244.202.13 10.244.218.9 10.244.222.12 10.0.93.176 10.244.108.16 10.0.12.252 10.244.87.12 10.0.56.215 10.244.215.11 10.0.218.187 10.244.76.13 10.0.106.42 10.244.223.12 10.0.249.113 10.244.95.15 10.0.202.124 10.244.247.10 10.0.12.43 10.244.172.10 10.0.64.147 10.244.133.13", "-C", "/tmp/529281271", "--log", "--timeout", "10", "--log-dir", "/var/log/fdb"]}
2021-02-11T11:19:48.254Z	ERROR	controller	Error from FDB command	{"namespace": "timeseries", "cluster": "foundationdb-cluster-1", "code": 1, "stdout": "ERROR: This exclude may cause the total free space in the cluster to drop below 10%.\nType `exclude FORCE <ADDRESS>*' to exclude without checking free space.\n", "stderr": "", "error": "exit status 1"}

After updating the cluster spec to use service IP, a migration was triggered, creating a new set of pods and moving data over to the new pods. After this step the controller is stuck, because it can not exclude the old pods. The reason is that the pods are using roughly %50 of disk, so the disk space calculation very conservatively assumes that the old pods may not be excluded.

Would it be possible for the controller to exclude old pods in smaller batches, so that the free disk check succeeds?

The only workaround I can think of for now is to scale up until the old pods are a sufficiently small percentage of the overall cluster, but that option will not always work for us, as we have some environments that are more resource constrained.

Re: rolling: this is Support rolling update of pods on migration · Issue #400 · FoundationDB/fdb-kubernetes-operator · GitHub

But: N Bytes of storage in M nodes, doubled = N Bytes in 2M nodes; excluded that will mean NBytes in M nodes. If the nodes are at 50% footprint before the migration, there should be no issue with the 10% thing. The only way I see a 10% issue arising is either many new pods are not yet provisioned, or you were already down at or below 10% before this step.

AFAIK the disk check only takes into account the most utilized disk, not the total number of bytes in the cluster foundationdb/fdbcli.actor.cpp at 1dac117543642bebbcff29623ee967145605e982 · apple/foundationdb · GitHub

Lets work it through: if your worst disks is at 50% usage, and you have 50 nodes being replaced we get:

(1-0.5)100/(100-0)=0.51 = 0.5

This is < 0.9 and so should not trigger the fault.

Please grab a detailed status.json from your cluster and also check that your new nodes have fully provisioned.