We set up a set of backup nodes/pods separate from regular data nodes. We are very happy that backup with multiple backup nodes have sped up significantly. For example, with a cluster of 50GB data in test env, with a single backup node with multiple agents, the backup takes 9min. With 2 backup nodes, it’s 5 mins; with 4 nodes, it’s 3min. With a large prod db of 3TB data in prod, we are able to reduce the backup time from 13.5 hours to 2.5 hours with six backup nodes.
On the other hand, I cannot achieve any performance boost for RESTORE with N nodes. Restores are all very slow. For the 50GB db in triple replication mode, I tested 4 times. It took 2 to 3 hours to restore. The durations were similar whether it’s one backup node or N backup nodes. It seems there has been NO parallelism for restore.
I haven’t tested restoring for the large db in prod with N backup nodes yet. In the past, I did restore with 1 backup node, and it took 27 hours for 1.5TB.
The restore seems to be quite slow. Is this normal with the numbers above?
Why are the backup time and the restore time so different (3 mins vs 2.5 hours)?
Do I miss something with parallel restore? How do I troubleshoot it?