Very good questions, and I actually do not have great answers yet.
The default value of 0.66 is just something I picked arbitrarily, I am actually not sure that the value shouldn’t just be 1 for most or all workloads. I have not done much experimentation around this.
The idea of leaving some page slack is that when you insert records into a page, if you overflow the page then you must split it. If there is some slack, then perhaps you can add the records, if they fit, and move on. A page split is expensive compared to adding some records to a page. The split involves reading all of the records in the page and then splitting them into 2 or more groups, building a new page for each group, and then adding child pointers to those new pages to their collective parent page. If this causes the parent page to overflow, then it too must be split, which can then cause its parent to overflow, and so on.
Depending on the workload, it could be beneficial to leave some slack in pages when so much data is inserted into a page that it must split into many pages. (Note that if the page is splitting into just two pages then each should be about half-full, but Redwood is not currently doing this yet). Then, the next several insertions into any of those pages will require a lot less work, but on the other hand if nothing is ever inserted into those pages then the space is just wasted.
You asked what data would be there - at the leaf level it would be more user records that belong within the page’s key boundaries, and at the internal levels of the tree it would be key boundaries used to make traversal decisions that point to new child pages added when their existing children are split.
I said “depending on the workload” above, but I am honestly not sure there is a workload which would see a long term benefit. There could possibly be an initial benefit when writing a lot of random KV pairs across most of the keyspace such that additional writes do fit within the page slack left behind, but then eventually the pages will fill up and need to be split anyway, so perhaps not filling them full initially is just delaying the inevitable?
Try values up to 0.99 for your workload (right now 1.0 will sometimes cause an error for complicated reasons) and look in the RedwoodMetrics
trace events. I expect that the higher the fill factor is, the higher the L<number>PageBuild
fields in the trace events will be. This means the workload is using more CPU, but for this additional CPU you get fewer IOPS (both write and read, because pages contain more data) and less disk space in used because there is less slack in the structure. If your storage processes are not CPU bound under load, then this extra CPU cost is probably worth it.
I should also point out that merging under-filled pages with their siblings is not yet written, so there are pathologically bad workloads that can cause a lot of BTree slack space. In many cases this would actually show up in the RedwoodMetrics
trace events, where the L<number>BldAvgFillPct
and L<number>ModAvgFillPct
percentages will be low. An average under 50 suggests there is too much slack in the tree.