Steve, quite interesting.
For the test dbs, I made backups after the data loading finished. No changes to the db when backups were taking place. I assume that means there were no mutations during the backup window.
I first loaded 50GB of data, and tested backups and restores. Since the restores took 2+ hours, I shrank the db to 15GB by clearing ranges of keys. After that, I tested again.
Here is the process count report of my cluster config:
CNT Role ---- ------------- 1 cluster_controller 3 coordinator 1 data_distributor 5 log 1 master 3 proxy 1 ratekeeper 1 resolver 30 storage
I have 5 Tx log processes on 5 different pods. Are they enough?
I created 6 backup pods with 6 agents on each pod, with a total of 36 agents. I checked the status json output. The 36 agents are connected to the db. From the status json output:
"cluster" : { "clients" : { "count" : 1, "supported_versions" : [ { "client_version" : "Unknown", "connected_clients" : [ { "address" : "10.69.196.45:54386:tls", "log_group" : "default" }, ...... (35 more similar entries) ], "count" : 36, "protocol_version" : "Unknown", "source_version" : "Unknown" },
…
Anything I should change?
Thanks.