we want to use fdb on k8s via the operator.
if however, for some reason we need to recreate the entire k8s cluster from scratch, will the fdb cluster be able to restore itself from its pvcs? what about things like the cluster file? where are they stored?
yes, clusterfile is stored in configmap and status field of foundationdbcluster object
Just to confirm I understand the question: You ask if it is supported to delete the Kubernetes cluster where the FDB cluster is running and then creating a new Kubernetes cluster and restoring all resources in that new cluster?
In theory this should work but this is nothing that is tested. You have to make sure that you restore the exact same ConfigMap and configuration (FoundationDBCluster resource) in the new Kubernetes cluster + you should be using the DNS feature (https://github.com/FoundationDB/fdb-kubernetes-operator/blob/main/docs/manual/customization.md#using-dns) to make it easier to restore the coordinators (as the new cluster will probably assign new IP addresses to those Pods) and you have to make sure that the PVs exist and contain the correct data.
Another approach would be to create the new Kubernetes cluster beside the old one and migrate the data, depending on the configuration you can use FDB HA for that: https://github.com/FoundationDB/fdb-kubernetes-operator/blob/main/docs/manual/fault_domains.md#multi-region-replication
so we want to deploy fdb chart via argocd. it is very possible that due to potential future issues with argocd we might need to fully delete and recreate the fdb chart. Thus we want to ensure that doing so doesnt wipe out the cluster.
if the clusterfile is stored as a k8s object, configmap as you mentioned, then that means the cluster cannot be restored by simply redeploying the fdb helm chart with the same config right?
are the contents of the ConfigMap in discussion modified by FDB during runtime? Is there any way to deterministically recreate it so cluster can be restored?
seedconnectionstring can be of help for foundationdbcluster configuration, but you will need to test it. also pvc might be deleted as it has owner reference so kubernetes gc will clean them up after fdb cluster deletion.
what are scenarios you are trying to cover?
so i am trying to basically have the cluster recover from failure of all nodes.
typically in a database, you only need the disks to be able to recover the database, eg kafka, postgres etc.
here in the k8s operator it seems there is more state than what is just stored in the disks. we are scared that if the k8s cluster needs to be fully recreated or the operator fully deleted and recreated, our data would still be lost since state is not just stored in disks.
- k8s cluster recreation - it’s going to be tough since you need generationid that changes for cluster stored separately and provide via seedconnection string (if you are able to preserve pvcs), in theory you can have restore procedure for fdb cluster after k8s cluster deletion, but it can be bigger effort (not sure if owner ref will be picked up by operator for pvcs, not sure how you can preserve it and how it will play out) - you will need to test it.
- if you delete operator then it still should be fine - after operator reconcile it will pick up state of cluster from status and will resume operation.
on side note if you afraid argocd will force recreate/delete foundationdbcluster remove autosync, I’m not sure if you accidentally delete application it will prune all umbrella objects.