Regardless of replication mode, the existence of other full replicas of the same data allows for FDB to resolve these issues without intervention:
Repairs, that of restoring replication when a host or drive has failed, is handled by FoundationDB automatically, in particular by its data distribution subcomponent. Key ranges that were affected by the loss will be assigned to a new team of servers. Until the data is re-replicated,
status will report decreased failure tolerance.
Errors, those of IO requests failing or transiently returning incorrect data, and Bit rot, that of data being silently corrupted on disk over time without being accessed, are handled similarly. Errors cause an
io_error to be thrown, and bit rot will be detected when accessed and treated as an IO error as well. The storage server will die, effectively discarding all of its data as potentially corrupt, and the data will be re-replicated from the non-corrupt nodes.
I’m interested to hear your specific reasons for desiring erasure encoding. It’d impose either an additional storage overhead per node, to provide resilience for data already held elsewhere, or require reads to hit more than 1 replica, if you’re doing erasure codes across replicas. Neither tradeoff seems appealing to me, but I’ve never actually dealt with an erasure encoded system to see the other side of this.