Never assign Storage role to Log processes

Is there an approach to prevent FoundationDB from ever assigning a storage role to log processes?
According to the Roles / Classes matrix and Locality.cpp, the storage role is defined as WorstFit for Log processes. For our case, it would be preferable to define it as NeverAssign, as assigning a storage role to a log process is likely to render the cluster unavailable.

Context:
We have recently, twice, experienced log processes being assigned the storage role when excluding storage processes. The log processes run on smaller disks compared to the storage processes (64GB vs 1TB). Both times the FoundationDB cluster became effectively unavailable, as the disks of the log processes became full during data rebalancing.

In each case the cluster was healthy and fully replicated (triple redundancy), and one or more exclusion(s) of storage processes, either due to replacement or shrinking, caused the FoundationDB cluster to assign storage roles to log processes.

We run FoundationDB on kubernetes through the fdb-kubernetes-operator: v1.28.1.
FoundationDB version: 7.1.43

Do you know why those log processes have the storage role assigned? And is the storage role removed after the excluded processes are removed? What fault domains does your setup use and how were the processes distributed at the time? Just wondering why FDB was choosing to use some log processes as storage processes.

I’ll try to answer as well as I can with the information I’ve collected so far.

Do you know why those log processes have the storage role assigned?

We see events related to this

<Event Severity="20" Time="1702069368.093155" DateTime="2023-12-08T21:02:48Z" Type="DDRecruitingEmergency" ID="cb30a25d47482259" Primary="1" ThreadID="10229846978395254337" Machine="10.115.13.206:4501" LogGroup="foundationdb-cluster" Roles="DD,MS,RK,RV" />
<Event Severity="20" Time="1702069368.291899" DateTime="2023-12-08T21:02:48Z" Type="RecruitStorageNotAvailable" ID="393d4a0abb4b842f" Error="no_more_servers" ErrorDescription="Not enough physical servers available" ErrorCode="1008" IsCriticalRecruitment="1" ThreadID="11583647727294248172" Machine="10.115.13.224:4501" LogGroup="foundationdb-cluster" Roles="CC,GP" />

Which seems to be caused by hasHealthyTeam (healthyTeamCount!=0) being false [source].
Theory: Without digging further into the code, I would assume a team is considered unhealthy when one of the processes are excluded. → If all teams have > 0 excluded processes, FoundationDB will start critical recruitment, adding storage roles to log processes.

What fault domains does your setup use and how were the processes distributed at the time?

In both cases we ran with a single pod (process) per machine (k8s node), with each machine being a fault domain. Triple redundancy.

First case:
3 storage processes. Replacing one storage process. One exclusion + one new storage process (new pod and machine) joining the cluster.

Second case:
30 storage processes. Scale down to 6 storage processes (through k8s operator). 24 exclusions.

And is the storage role removed after the excluded processes are removed?

We didn’t observe this as most of the excluded processes didn’t manage to empty their disk/storage before the log disks became full, i.e. they were never removed.