Cannot clear all keys by "clearrange \x00 \xff"

We have a huge database for testing purpose and now we want to re-test it. So we need to clear the whole database.

I’ve searched a lot and all of them point to the same answer: clearrange \x00 \xff.
But after running this command, there are still a lot of data inside the database:

  Sum of key-value sizes - 2.030 TB
  Disk space used        - 10.812 TB

So I have two questions:

  1. Is this the correct way to clear the whole database?
  2. How could I free the whole disk space usage?

Thanks in advance!

I’m assuming you meant to type clearrange \x00 \xff and not \ff.

Past that, deletes do not take effect immediately on disk. There is a background process which periodically frees old disk pages. It was changed relatively recently to run more often in this PR: https://github.com/apple/foundationdb/pull/1485/files

There are some knobs you can fiddle with listed in that PR if your workload will involve a lot of deletes and you want to dedicate more IO and CPU to them to run more frequently and perform more work per run.

Definitely test these out on a non-prod cluster first!

Hi, you are correct I mean to type clearrange \x00 \xff. I think I have waited a relatively long time - one day. But there is no difference at all.

It looks like there are really still 2TB key-value sizes in the database. But I just don’t know where are they and how to delete them.

There hasn’t been any progress at all during that day? What did you start off with?

The only thing that command wouldn’t delete AFAIK is data in the system keyspace, and there wouldn’t be 2TB of data in there unless someone misconfigured their application to write into it.

I actually use go-ycsb to benchmark the cluster. It is possible that the implementation of go-ycsb’s driver is misconfigured. But as it uses the client, will client write data into it?

And yes there isn’t any progress at all during the day. As it is just a test environment.

I am almost sure that there can be keys before \x00 too.

In java bindings, I usually create a range like new byte[]{}, new byte[]{(byte) 0xFF} to capture the entire key space for deletion etc.

Is it possible that you have keys written in keyspace that sort before \x00? I do not know off-hand how to list or delete those from CLI; can you write a small code snippet to test if there is any data that you are getting with the range I mentioned above?

db.run(tx -> {
            tx.clear(new byte[]{}, new byte[]{(byte) 0xFF});
            return null;
       });

Also, I’ve observed that FDB instantly reduces the key-value sizes when keys are cleared, but Disk space used gradually comes down as the background vacuuming progresses. In your case, it seems like there are a lot of live keys in the DB.

Yes, I observed the same behavior as you say. The key-value sizes go down instantly while the disk usage goes down slowly. In my case, I think there are still a lot of live keys :frowning:

I will try your advice, thanks a lot. And hope we can get a way to check all keys including hidden ones through CLI.

There is only one key before \x00, which is the empty key. This is why I typically recommend using the following command in fdbcli to clear the whole database (note you must use double quotes " instead of single quotes '):

clearrange "" \xff

When you run a clear range like this, you should see the sum of key value sizes drop quickly, but the disk space may take a while to recover. The empty key won’t hold 2 TB, though, so that wouldn’t explain your issue.

What happens if you run a range read on the main key space? It should return no keys:

getrange "" \xff

If that is empty, the next thing to check would be the \xff key space. The most likely way of accumulating data there would be if you turned on backup or DR but didn’t have any agents to do the associated work.

An effective way to check this is to use the locality API to get shard boundaries. If you really have a lot of data somewhere, there should be a lot of shard boundaries that indicate where. In the Python bindings, for example, you would use:

fdb.locality.get_boundary_keys(db, b"", b"\xff\xff")

If that doesn’t turn up anything, or if turns up a bunch of shards which are in fact empty, then there may be something going wrong.

1 Like

Thanks for your reply! As I have to keep the test going, I eventually reinitialized the whole cluster. But I think it will happen again during the rest of the test. I will try your advice later.

Some additional information: Because it is a test, I frequently change the settings of the cluster, so it might cause some problem… Or not?

What settings are those?

The class of processes. To figure out the best combination and to understand it better.

That typically shouldn’t cause any issues. However, I am not sure how well we test those scenarios.

If you want to be on the super safe side (might not be the case here - but just in case you want to do this in production), it is usually better to first exclude the process and then change its class.

Hi, All.

I’m tuning an application that is running against fdb.

The problem is that after clearing a large range foundationdb starts performing some background disk i/o activity that influances the application performance.

  1. Are there any knobs that to control the agressivity of the i/o activity?
  2. Are there any capability of temporary enabling/disabling this background activity?

I think the knobs are optimized for this use-case already. I don’t believe this will ever get better with the sqlite storage engine (and I would expect this getting much worse with RocksDB).

I would recommend you try Redwood in FDB 7.1 which should at this point be pretty stable. We don’t yet recommend using Redwood for production but if you can reproduce your issue in a testing environment you could verify whether redwood solve this problem for you. We hopefully will declare redwood as being stable very soon

But the user should have a choice between more fast cleaning and better application performance.

There are a number of knobs that relate to this cleaning function that you can tune:

One knob in particular, SPRING_CLEANING_LAZY_DELETE_INTERVAL, used to be set at 1.0 and was reduced to 0.1 to cause lazy deletion to run 10x more frequently. You could try increasing it to slow it back down, but one potential consequence of this is that a cluster may not be able to reclaim space from data movement, etc. very effectively.

You could also try increasing SPRING_CLEANING_VACUUM_INTERVAL. This is already fairly slow, but the process is not usually necessary unless since FDB can reuse unvacuumed space. If you run multiple processes on the same disk or share it with non-FDB things, then vacuuming may still be helpful though to return unused space to the OS.

1 Like