I DR a cluster to my own with 6x3 storage server (ie, 3 server per pod). The size of the data to DR is 330 GB key-values pairs. It took about 4 days to have both DB in sync. There are a couple of issues I see:
- I run dr_agent on the 3 proxy and 3 logs servers so a total of 6 dr_agents, but during the operation, I can see not all the storage servers are busy, some are 100% cpu while others are low 10-20%. Why is the data not distributed evenly to all the servers when dr is in operation? Is there a way to improve that?
- Given some storage server is more busy then then other, eventually, it lead to 2 server lagging behind. One with a 20k seconds behind by the time the DB is in sync. Is that normal?
- And during the operation and after the db are in sync, there are 200GB data in moving data and repartitioning is keep on going, is that normal?
- At the moment after the DB are in sync and I have a complete copy of the original db, I aborted the DR and then the one storage server that lag behind suddenly jump to a huge number in the 100M level, is that normal? I have to exclude that server and that kind of bring the system back but just wonder what is going on.
- The DB after the abort of DR is running ok, the only thing is there is still repartitioning going on, about 200 GB out of a sum of 330 GB of data are in moving. So, is the DB slowly rebalance itself?