Data distribution control, monitor and pause

Hi,

During my tests I add/delete nodes and change cluster configuration (e.g. from single to double redundancy mode). And it’s not clear to me what is the status of the cluster’s storage. I can see the following unknown status for a while (right now for 2 hours).

Data:
  Replication health     - (Re)initializing automatic data distribution
  Moving data            - unknown (initializing)
  Sum of key-value sizes - unknown
  Disk space used        - 90.891 GB

But if I run du -hs on the new members, I can see that the data directory is increasing in size. But things move pretty slow.

So my questions are:

  • Is it possible to change the speed of data migrations? I don’t see CPU/IO bottleneck on the hosts.
  • Can I pause data migrations? For example in a case of service issues or some bottlenecks.
  • Maybe there is another way to see the cluster status, amount of data to migrate and ETA?

There aren’t any easily available controls for data distribution. There are some knobs that control its behavior, but in general changing knobs is something that should be done with great caution, and if used they have to be set at the startup of your fdbserver processes. I wouldn’t recommend changing them without a particular need, though, and I don’t personally know enough about the effects of changing the data distribution knobs to make any recommendations there.

The cluster should be able to tell you how much data there is to move and how much it is currently moving through status, and the fact that it hasn’t for 2 hours seems like something isn’t working quite right. Are you using a build from the master branch? There have been a lot of changes here, and I think in some cases the effects on the status output haven’t been fully evaluated yet.

There is no ETA published for data movement. It should be noted, though, that data movement can happen constantly, particularly if there are lots of writes happening.

Thank you for answer!

Yes, I built my server from a git tag 5.1, but I see that there are some merges were made after.

What’s bothering me is that I can’t pause data moving among nodes and don’t have visibility utils/stats to understand what’s going on after a big insert spike, for example.