Data distribution control, monitor and pause

brk0v · May 18, 2018, 1:22pm

Hi,

During my tests I add/delete nodes and change cluster configuration (e.g. from single to double redundancy mode). And it’s not clear to me what is the status of the cluster’s storage. I can see the following unknown status for a while (right now for 2 hours).

Data:
  Replication health     - (Re)initializing automatic data distribution
  Moving data            - unknown (initializing)
  Sum of key-value sizes - unknown
  Disk space used        - 90.891 GB

But if I run du -hs on the new members, I can see that the data directory is increasing in size. But things move pretty slow.

So my questions are:

Is it possible to change the speed of data migrations? I don’t see CPU/IO bottleneck on the hosts.
Can I pause data migrations? For example in a case of service issues or some bottlenecks.
Maybe there is another way to see the cluster status, amount of data to migrate and ETA?

ajbeamon · May 18, 2018, 4:08pm

There aren’t any easily available controls for data distribution. There are some knobs that control its behavior, but in general changing knobs is something that should be done with great caution, and if used they have to be set at the startup of your fdbserver processes. I wouldn’t recommend changing them without a particular need, though, and I don’t personally know enough about the effects of changing the data distribution knobs to make any recommendations there.

The cluster should be able to tell you how much data there is to move and how much it is currently moving through status, and the fact that it hasn’t for 2 hours seems like something isn’t working quite right. Are you using a build from the master branch? There have been a lot of changes here, and I think in some cases the effects on the status output haven’t been fully evaluated yet.

There is no ETA published for data movement. It should be noted, though, that data movement can happen constantly, particularly if there are lots of writes happening.

brk0v · May 23, 2018, 1:29pm

Thank you for answer!

Yes, I built my server from a git tag 5.1, but I see that there are some merges were made after.

What’s bothering me is that I can’t pause data moving among nodes and don’t have visibility utils/stats to understand what’s going on after a big insert spike, for example.

Topic		Replies	Views
Data Distribution Stopped - How to Restart? Using FoundationDB	13	1848	November 12, 2019
Debugging Data Distribution Using FoundationDB	3	910	November 14, 2018
Repartionning after storage server (ss) restart Using FoundationDB	0	25	November 2, 2024
Knobs to control data relocation? Using FoundationDB	5	523	October 5, 2018
DD(data_distributor) process does not work in large clusters Using FoundationDB	1	410	May 5, 2023

Data distribution control, monitor and pause

Related topics