Fdbdr done but created a huge storage lag

tangerine · May 4, 2022, 2:08pm

HI,
I DR a cluster to my own with 6x3 storage server (ie, 3 server per pod). The size of the data to DR is 330 GB key-values pairs. It took about 4 days to have both DB in sync. There are a couple of issues I see:

I run dr_agent on the 3 proxy and 3 logs servers so a total of 6 dr_agents, but during the operation, I can see not all the storage servers are busy, some are 100% cpu while others are low 10-20%. Why is the data not distributed evenly to all the servers when dr is in operation? Is there a way to improve that?
Given some storage server is more busy then then other, eventually, it lead to 2 server lagging behind. One with a 20k seconds behind by the time the DB is in sync. Is that normal?
And during the operation and after the db are in sync, there are 200GB data in moving data and repartitioning is keep on going, is that normal?
At the moment after the DB are in sync and I have a complete copy of the original db, I aborted the DR and then the one storage server that lag behind suddenly jump to a huge number in the 100M level, is that normal? I have to exclude that server and that kind of bring the system back but just wonder what is going on.
The DB after the abort of DR is running ok, the only thing is there is still repartitioning going on, about 200 GB out of a sum of 330 GB of data are in moving. So, is the DB slowly rebalance itself?
Thanks,

Topic		Replies	Views
[Bugs] DR with multiple dr_agent Using FoundationDB operator	0	35	June 7, 2025
The cluster is continuously Restoring replication factor, and Moing data has not decreased Running FoundationDB performance , operator	1	48	October 10, 2024
Storage Server CPU bottleneck - Growing data lag Using FoundationDB performance	22	3008	December 13, 2021
Storage servers restarting Using FoundationDB	0	278	January 19, 2023
Delayed FDB dr? Running FoundationDB	2	60	June 20, 2025

Fdbdr done but created a huge storage lag

Related topics