We have been running an fdb cluster version 6.0.15 in double replication and memory mode, across 3 ndoes and 4 processes in each node (not on cloud). The total KV pairs went up to ~200GB and the disk space used went upto ~3TB. We wanted to free up some disk space and hence did a clearrange of some old data which we no longer need. The KV pairs size went down to ~60GB, but the disk space used is still ~2.9TB. It’s been more than 24 hours now and the disk space is not seeming to be going down. There are no clients connecting to the cluster and the status details says all the processes are using 1% cpu and 0% disk.
I wanted to understand what is happening in this case, and if there’s any way to force kick off a cleanup process to bring down the disk space
In the memory storage engine, the files on disk never actually get smaller. That means if they grow large, they will continue to stay large forever. When you free up space in the database by clearing keys, it should be the case that you start using less of the file, and therefore the extra space allocated is available for future use by the database process.
If you need to reclaim the space for other purposes (e.g. other uses on the host besides the database) or you just want the files to be physically smaller, then there are some tricks you can use to cause this change to occur. One is to convert your storage engine to another (e.g. ssd) and back, which will cause it to rewrite the files. If you do this, I think you’ll need to wait for the conversion to complete and the memory storage engine files to be deleted before switching back. Note that if you do this, your storage requirements will temporarily go up as you’ll be storing data in both formats. Also, you’ll of course be changing to have the performance properties of the ssd storage engine, and depending on what you’re doing this may not be acceptable.
Another option is to use exclude. The goal will be to exclude every process in the cluster so that it deletes its data files and you end up with a bunch of new smaller files at the end. There are a variety of ways you could accomplish this, such as:
migrate to an entirely new set of hosts (add 3 new nodes, exclude the 3 old nodes, and let the data move)
exclude processes one at a time until removed, then add it back (note that in this procedure, your files will get somewhat larger than the baseline requirements because of data movement to remove excluded processes)
Looking into it a bit more, it appears that this behavior of disk queue files has changed in 6.1, so that it can now truncate to a smaller size. That doesn’t affect you running 6.0, of course, but if you upgrade or for anyone running more recent versions, it would be relevant.
I don’t see a release note describing the difference, and I’m not yet certain if it applies universally or only in certain configurations or circumstances.
Adding to this, even with 6.1 you won’t necessarily see the memory storage engine files shrink right away, but they eventually will shrink.
Some more detail…
A memory storage engine is composed of two files. A file is grown by appending commits to it which are comprised of snapshots of key/value pairs from the storage engine and new incoming mutations. The amount of the former is based on the amount of the latter. I think the ratio is either 2:1 or 1:1 currently, I can’t remember.
When you delete a large amount of data in a particular storage server, one of the two files will still contain all of the data that used to exist until that file can be truncated. This truncation cannot happen until the other of the two files contains a full snapshot of the assigned keyspace, which could take a while depending on your write workload since snapshot bytes are only written in as an add-on to mutation bytes.
@ajbeamon - In case of ssd storage engine if I need to reclaim storage space what is the fastest way of doing so? As suggested in this post of yours can I just change the knob_cleaning_interval or there is some other way to reclaim the disk space quicker? I can stop all the read and writes to the fdb cluster when I am performing this operation.
The same tricks described here work in your case too, though switching to something like the memory storage engine may not be feasible if you are storing lots of data.
Changing the knobs is in that post is the other available option.