Design and Implementation of a Performant Restore System in FDB

@osamarin Thank you very much for the interesting idea and detailed description!

I will comment from the high-level to details:

  1. Your proposal makes sense. I had the similar thought and had an issue here: https://github.com/apple/foundationdb/issues/2127. I guess we are on the same page. :wink:

  2. I think the DR-based backup approach is an interesting idea and I personally like it (at least it is a good direction to explore).

However, it does have a major concern:
Now FDB has multi-region configuration (also called fearless configuration), which no long has a separate FDB cluster in the remote DC. The multi-region configuration will become the recommended configuration for high-availability service.
The DR-based backup and restore solution won’t work out of box for the multi-region configuration.

  1. About the speed of file-based backup and restore in your evaluation, it is 12.2 Gb backup data, and restore time is 5m44s. So the restore speed is 12.2 * 1024MB/ 5m44s ~= 36MB/s. The speed itself is still slow for a big backup size and the current fdb restore can reaches up to 100MB/s (IIRC).

Intuitively, I think the file-based restore should be much faster than the measured speed. Did you happen to test where the time is spent and if it is scalable to make the proposed restore faster?