Restore is slow and parallel restore doesn't achieve performance boost

Hi Steve, I went after all the items in your list one by one. We are fine with the first 4 items, but I expanded provies and Tx logs. I’ve made some progress with restores. For the 50GB db,

  • It used to be 2+ hours a few days ago.
  • yesterday’s restore was 1hr 10min. I didn’t do much with this restore itself, but we happened to upgrade the Kubernetes cluster by 3 minor versions 2 days ago. The upgrade could have enhanced the speed.
  • Today it’s 55min after adding more Tx and proxies to the fdb cluster following your advice.

I found something interesting from our Tess internal Wiki page on K8s volumes: local-dynamic is 3-time slower in IO performance than local-ssd (one-third). The IOPS are 10K for local-dynamic VS. 30K for local-ssd. In the test K8s cluster, I have used local-dynamic. In the big prod db, we use local-ssd. Therefore the test db has slower SSDs than the prod db.

What a coincidence! Can the 3-time SLOWER performance of local-dynamic in test env be (at least partially) responsible for the 3-time slower restore performance we have than Apple’s 50MB/sec?

I’ll find it out when doing restore testing with prod db. Currently we have planned the K8s upgrade on that prod K8s cluster as well. Will test after the upgrade.

Thank you.

Leo