Redwood: perpetuum moving data between storage servers

Hello, all.

I’m testing a fdb (version 7.1.21) cluster with redwood storage server. The cluster contains three nodes, each node with one stateless, one transaction and two storage servers.

The problem is the cluster always moves data between storage servers: periodically each storage server is got all the date out and then it is filling with data again.

Wat is the reason of this behavior?

Do you have the perpetual wiggle enabled? This is a feature that constantly removes a storage server and then re-adds it (basically doing exclude/include in a loop).

The reason we do this is because rewriting B-trees compacts them nicely. SQLite is also profiting from this (maybe even more than redwood). But this behavior can be turned off.

I think you can turn it off using the following fdbcli command:

configure perpetual_storage_wiggle=0

Thank you, @markus.pilman for the fast answer.

Yes, I have perpetual_storage_wiggle=1 because it was suggested when I was changing the storage engine to redwood-1-experimental.

I’ll try to repeat my tests with perpetual_storage_wiggle=0

I received a warning:

fdb> configure perpetual_storage_wiggle=0
Configuration changed, with warnings
WARN: To make progress toward the desired storage type with storage_migration_type=gradual, the Perpetual Wiggle must be enabled.
Type `configure perpetual_storage_wiggle=1' to enable the perpetual wiggle, or `configure storage_migration_type=gradual' to set the gradual migration type.

What is the correct future steps for me? Earlier this database was configured with the ssd storage engine, but this migration has already been done. Have I change any configuration for continuing with redwood?

I think this means that the migration didn’t finish yet. You need to wait until each storage went through this transition once. Basically this storage migration is trying to migrate your data slowly to not overwhelm the cluster.

If you don’t care (mostly if you don’t need this cluster to be available) you can do an aggressive migration)

The warning is just saying that you have selected a storage migration mode of “gradual” but that with the PW disabled no progress in migrating to a new storage engine would actually occur if you changed storage engines.

Unfortunately status currently only reports the configured storage engine mode but it doesn’t tell you how many of each storage engine type you have at the moment. The JSON status document has this though, under each storage role there is a metadata object that contains the storage engine type for each storage instance.

I’m sorry for the late response.

The problem was the migration had finished (each storage server had a redwood file and did not have sqlite files) several days ago, but FDB did not catch the completion of the migration and thought that the migration was still in progress.

Unfortunally, I’ve already recreated this stand and created fdb with redwood so I cann’t provide any more additional information. As I remember, I made the following steps

  1. Configure new ssd double
  2. Fill the db with data
  3. clearrange '' \xff. It logically erased the data but the sqlite files were still large.
  4. configure ssd-redwood-1-experimental. It required to enable set perpetual_storage_wiggle=1 and started migration that finished immediately because the database was empty. Sqllite files dissappeared.
  5. Fill the database with data.
  6. Constant removing storage servers occured.
  7. After configure perpetual_storage_wiggle=0 I received the message WARN: To make progress toward the desired storage type with storage_migration_type=gradual, the Perpetual Wiggle must be enabled.

Actually the Perpetual Wiggle is called this because it is perpetual while enabled. It’s actually useful to rebuild BTrees every once in a while as it is a very efficient way to remove slack space, but unfortunately in FDB 7.1 the process goes too fast. It constantly removes / re-adds storages processes one at a time which is too fast for most clusters.

In FDB 7.1 you can fix this by setting a delay between processes using the PERPETUAL_WIGGLE_DELAY knob. I suggest 2000000 which is in seconds or about 23 days.

In FDB 7.2 and later, the Perpetual Wiggle has a new concept of “minimum storage server age” which defaults to 3 weeks and is controlled by a knob DD_STORAGE_WIGGLE_MIN_SS_AGE_SEC, also in seconds. The logic here is that if a Storage Server is using the desired configured Storage Engine type, then it will not be “wiggled” out/in until it is at least this old.

While the default ssd / SQLite based engine does not need the PW, many workloads will see an increase of CPU and IO overhead per operation over time as the BTree becomes more fragmented, which is mitigated by the PW periodically draining, recreating, and refilling it.

Strictly speaking, Redwood does need the PW at least for some workloads, though most probably could run without it. Redwood makes some design tradeoffs to reduce the IO and CPU cost of writes, specifically not merging under-filled sibling pages and not maintaining internal metadata necessary to shrink data files to give internal free space back to the filesystem. These decisions make sense in the context of a production FDB cluster with dedicated data volumes, as giving space back to the filesystem is unnecessary (it will be reused within Redwood) and FDB Data Distribution will move/balance key ranges around between Storage Servers which will effectively remove BTree slack and keep logical space usage roughly balanced.

If, however, your workload were to have an update pattern that produces an unusual amount of slack space constantly, or if you were to delete a large amount of data from your cluster and not replace it and you wanted all of your data files to shrink instead of holding free space internally, then the Perpetual Wiggle is a backstop to control slack and shrink data files.

1 Like