FDB new features: fill factor

Hi,
Please, could anybody help on some questions below:

  1. what’s the main advantages of using fill-factor in FDB?
  2. what’s the benefits of setting Fillfactor 0.66 as the default value?
  3. with the fillfactor value of less than 1, Redwood splits data into pages, so what kind of data will be saved for those un-filled part in a page?

Thanks,

Very good questions, and I actually do not have great answers yet.

The default value of 0.66 is just something I picked arbitrarily, I am actually not sure that the value shouldn’t just be 1 for most or all workloads. I have not done much experimentation around this.

The idea of leaving some page slack is that when you insert records into a page, if you overflow the page then you must split it. If there is some slack, then perhaps you can add the records, if they fit, and move on. A page split is expensive compared to adding some records to a page. The split involves reading all of the records in the page and then splitting them into 2 or more groups, building a new page for each group, and then adding child pointers to those new pages to their collective parent page. If this causes the parent page to overflow, then it too must be split, which can then cause its parent to overflow, and so on.

Depending on the workload, it could be beneficial to leave some slack in pages when so much data is inserted into a page that it must split into many pages. (Note that if the page is splitting into just two pages then each should be about half-full, but Redwood is not currently doing this yet). Then, the next several insertions into any of those pages will require a lot less work, but on the other hand if nothing is ever inserted into those pages then the space is just wasted.

You asked what data would be there - at the leaf level it would be more user records that belong within the page’s key boundaries, and at the internal levels of the tree it would be key boundaries used to make traversal decisions that point to new child pages added when their existing children are split.

I said “depending on the workload” above, but I am honestly not sure there is a workload which would see a long term benefit. There could possibly be an initial benefit when writing a lot of random KV pairs across most of the keyspace such that additional writes do fit within the page slack left behind, but then eventually the pages will fill up and need to be split anyway, so perhaps not filling them full initially is just delaying the inevitable?

Try values up to 0.99 for your workload (right now 1.0 will sometimes cause an error for complicated reasons) and look in the RedwoodMetrics trace events. I expect that the higher the fill factor is, the higher the L<number>PageBuild fields in the trace events will be. This means the workload is using more CPU, but for this additional CPU you get fewer IOPS (both write and read, because pages contain more data) and less disk space in used because there is less slack in the structure. If your storage processes are not CPU bound under load, then this extra CPU cost is probably worth it.

I should also point out that merging under-filled pages with their siblings is not yet written, so there are pathologically bad workloads that can cause a lot of BTree slack space. In many cases this would actually show up in the RedwoodMetrics trace events, where the L<number>BldAvgFillPct and L<number>ModAvgFillPct percentages will be low. An average under 50 suggests there is too much slack in the tree.

So much detailed explanations! Thanks very much! I read it through and still have several questions here, could you please help on them:

  1. when doing update operation in FDB, how is the process? like the value will be updated in-place or not? whether the old value will not be deleted and just marked it as invalid, and then FDB will write the new value in a different node?

2)Here below is my previous test result (storage engine: Redwood, FDB release version: 6.2 no-use-fillfactor ):
using du -BG /opt/data/ we can get the file size information below:
31 /opt/data/fdb-6.2/data/4519
30 /opt/data/fdb-6.2/data/4505
30 /opt/data/fdb-6.2/data/4521
30 /opt/data/fdb-6.2/data/4502
30 /opt/data/fdb-6.2/data/4525
30 /opt/data/fdb-6.2/data/4509
30 /opt/data/fdb-6.2/data/4516
30 /opt/data/fdb-6.2/data/4503
30 /opt/data/fdb-6.2/data/4523
30 /opt/data/fdb-6.2/data/4530
30 /opt/data/fdb-6.2/data/4512
31 /opt/data/fdb-6.2/data/4526
30 /opt/data/fdb-6.2/data/4507
30 /opt/data/fdb-6.2/data/4511
30 /opt/data/fdb-6.2/data/4510
30 /opt/data/fdb-6.2/data/4524
30 /opt/data/fdb-6.2/data/4528
31 /opt/data/fdb-6.2/data/4517
30 /opt/data/fdb-6.2/data/4508
30 /opt/data/fdb-6.2/data/4504
30 /opt/data/fdb-6.2/data/4518
30 /opt/data/fdb-6.2/data/4515
30 /opt/data/fdb-6.2/data/4527
30 /opt/data/fdb-6.2/data/4514
2 /opt/data/fdb-6.2/data/4501
30 /opt/data/fdb-6.2/data/4513
30 /opt/data/fdb-6.2/data/4506
30 /opt/data/fdb-6.2/data/4529
30 /opt/data/fdb-6.2/data/4520
30 /opt/data/fdb-6.2/data/4522
1 /opt/data/fdb-6.2/data/4500
863 /opt/data/fdb-6.2/data

there are 29 storage servers, the saved db size seems to be the same: 30G or 31G, those strong blob 1 is and 2 is not storage server, they are stateless and log server, all the fdb server are the same workstation.
for the test, only about 600G (total recordcount=2000000000, every records is 300 Bytes), but the total final dbset is up to 863GB (please see the last strong blob line ).
my question:
load 600G data, File System du detects 863G, so the gap of 863-600=263G is coming from indexes or something others?
is there any way to check the index size?

Thanks,

When you load N amount of key-value bytes into FDB remember that FDB has to

a) replicate it per your replication factor - what is your replication set to? (single/double/triple?)

b) add overhead which describes those key and value strings and organizes them into searchable structures. The BTree storage engines used in FDB (SQLite, Redwood) do not so much contain indexes as they are indexes. They are trees of key-value pairs. The values in the tree are the values from the user, not pointers into another data structure.

With Redwood specifically, prefix compression can offset the overhead described above, but Redwood also does not shrink files when space is freed internally, so they always contain free space which will be reused during writes but does not show up in the filesystem. See the StorageMetrics trace event for an accounting of the free space on the disk and free space total for reuse (which includes the internal space) although I don’t believe that figure was accurate for Redwood in FDB 6.2.

All updates in a page in Redwood are initially performed on a copy of the page. The in-page structure only supports marking existing entries as deleted and inserting new entries, so if the page can’t fit the edits then it is rebuilt which will remove the deleted entries. The new page is written to disk, and once committed the pager will redirect reads of the original page ID at the new commit version to the new page, transparently.

Thanks very much again!

a) replicate it per your replication factor - what is your replication set to? (single/double/triple?)

my replication is set to single

load 600G data, File System du detects 863G, so the gap of 863-600=263G

so the extra 263G is the total size of below?

  1. describe information for searchable structures
  2. some internal free space
  3. very limited indexes.

All updates in a page in Redwood are initially performed on a copy of the page. The in-page structure only supports marking existing entries as deleted and inserting new entries, so if the page can’t fit the edits then it is rebuilt which will remove the deleted entries. The new page is written to disk, and once committed the pager will redirect reads of the original page ID at the new commit version to the new page, transparently.

For this:

  1. “in-page structure” means those pages differ from the copy of the page?
  2. "so if the page can’t fit the edits " means that the free space of current page is not enough and couldn’t hold the the inserting of new entries (tree will be re-organized ) ?
  3. “then it is rebuilt which will remove the deleted entries.” : after removing the deleted entries, the space will be reuse?
  4. basing on the process of update in FDB’s Redwood engine , it seems that setting smaller fill-factor could help promote the IOPS performance for update-workload, not just only insert-workload, am I right? could you please help correct me?

Regards,