Redwood: perpetuum moving data between storage servers

osamarin · October 26, 2022, 3:54pm

Hello, all.

I’m testing a fdb (version 7.1.21) cluster with redwood storage server. The cluster contains three nodes, each node with one stateless, one transaction and two storage servers.

The problem is the cluster always moves data between storage servers: periodically each storage server is got all the date out and then it is filling with data again.

Wat is the reason of this behavior?

markus.pilman · October 26, 2022, 4:10pm

Do you have the perpetual wiggle enabled? This is a feature that constantly removes a storage server and then re-adds it (basically doing exclude/include in a loop).

The reason we do this is because rewriting B-trees compacts them nicely. SQLite is also profiting from this (maybe even more than redwood). But this behavior can be turned off.

I think you can turn it off using the following fdbcli command:

configure perpetual_storage_wiggle=0

osamarin · October 26, 2022, 4:41pm

Thank you, @markus.pilman for the fast answer.

Yes, I have perpetual_storage_wiggle=1 because it was suggested when I was changing the storage engine to redwood-1-experimental.

I’ll try to repeat my tests with perpetual_storage_wiggle=0

osamarin · October 26, 2022, 4:45pm

I received a warning:

fdb> configure perpetual_storage_wiggle=0
Configuration changed, with warnings
WARN: To make progress toward the desired storage type with storage_migration_type=gradual, the Perpetual Wiggle must be enabled.
Type `configure perpetual_storage_wiggle=1' to enable the perpetual wiggle, or `configure storage_migration_type=gradual' to set the gradual migration type.

What is the correct future steps for me? Earlier this database was configured with the ssd storage engine, but this migration has already been done. Have I change any configuration for continuing with redwood?

markus.pilman · October 26, 2022, 10:32pm

I think this means that the migration didn’t finish yet. You need to wait until each storage went through this transition once. Basically this storage migration is trying to migrate your data slowly to not overwhelm the cluster.

If you don’t care (mostly if you don’t need this cluster to be available) you can do an aggressive migration)

SteavedHams · November 11, 2022, 6:25am

The warning is just saying that you have selected a storage migration mode of “gradual” but that with the PW disabled no progress in migrating to a new storage engine would actually occur if you changed storage engines.

Unfortunately status currently only reports the configured storage engine mode but it doesn’t tell you how many of each storage engine type you have at the moment. The JSON status document has this though, under each storage role there is a metadata object that contains the storage engine type for each storage instance.

osamarin · November 15, 2022, 8:23am

I’m sorry for the late response.

The problem was the migration had finished (each storage server had a redwood file and did not have sqlite files) several days ago, but FDB did not catch the completion of the migration and thought that the migration was still in progress.

Unfortunally, I’ve already recreated this stand and created fdb with redwood so I cann’t provide any more additional information. As I remember, I made the following steps

Configure new ssd double
Fill the db with data
clearrange '' \xff. It logically erased the data but the sqlite files were still large.
configure ssd-redwood-1-experimental. It required to enable set perpetual_storage_wiggle=1 and started migration that finished immediately because the database was empty. Sqllite files dissappeared.
Fill the database with data.
Constant removing storage servers occured.
After configure perpetual_storage_wiggle=0 I received the message WARN: To make progress toward the desired storage type with storage_migration_type=gradual, the Perpetual Wiggle must be enabled.

SteavedHams · November 15, 2022, 9:56am

Actually the Perpetual Wiggle is called this because it is perpetual while enabled. It’s actually useful to rebuild BTrees every once in a while as it is a very efficient way to remove slack space, but unfortunately in FDB 7.1 the process goes too fast. It constantly removes / re-adds storages processes one at a time which is too fast for most clusters.

In FDB 7.1 you can fix this by setting a delay between processes using the PERPETUAL_WIGGLE_DELAY knob. I suggest 2000000 which is in seconds or about 23 days.

In FDB 7.2 and later, the Perpetual Wiggle has a new concept of “minimum storage server age” which defaults to 3 weeks and is controlled by a knob DD_STORAGE_WIGGLE_MIN_SS_AGE_SEC, also in seconds. The logic here is that if a Storage Server is using the desired configured Storage Engine type, then it will not be “wiggled” out/in until it is at least this old.

While the default ssd / SQLite based engine does not need the PW, many workloads will see an increase of CPU and IO overhead per operation over time as the BTree becomes more fragmented, which is mitigated by the PW periodically draining, recreating, and refilling it.

Strictly speaking, Redwood does need the PW at least for some workloads, though most probably could run without it. Redwood makes some design tradeoffs to reduce the IO and CPU cost of writes, specifically not merging under-filled sibling pages and not maintaining internal metadata necessary to shrink data files to give internal free space back to the filesystem. These decisions make sense in the context of a production FDB cluster with dedicated data volumes, as giving space back to the filesystem is unnecessary (it will be reused within Redwood) and FDB Data Distribution will move/balance key ranges around between Storage Servers which will effectively remove BTree slack and keep logical space usage roughly balanced.

If, however, your workload were to have an update pattern that produces an unusual amount of slack space constantly, or if you were to delete a large amount of data from your cluster and not replace it and you wanted all of your data files to shrink instead of holding free space internally, then the Perpetual Wiggle is a backstop to control slack and shrink data files.

Topic		Replies	Views
Is it possible to enable `perpetual_storage_wiggle` at the 'shard' level instead of the 'process' level? Using FoundationDB	9	130	January 21, 2025
FDB cluster rebalance endless loop Running FoundationDB	3	734	August 5, 2022
Observing higher latency probes after disabling perpetual_storage_wiggle Using FoundationDB	5	208	March 26, 2024
Incorrect example for setting ssd mode in docs Using FoundationDB	3	491	October 12, 2023
Migrating from a large cluster to another Using FoundationDB	14	2410	November 6, 2018

Redwood: perpetuum moving data between storage servers

Related topics