I adding support for fdb_transaction_get_range_split_points
to the .NET binding, and when doing some basic testing, I am getting a list of split points that seem very unevenly distributed, and with a lot of variation between chunks (multiple orders of magnitude from smallest to largest).
My test runs on a 7.1.24 multi-process cluster that is mostly empty (used for testing). I insert a set of 50k keys, all with 50-byte random values (~2.5 MB of data).
I then get the list of split points with a chunk size of 12,500 bytes (I’m expecting ~200 chunks in return), and query the keys between each points to count the number of keys and size. This gives me 109 chunks as small as 200 bytes (4 keys!) and as large as 150KB, with average 23 KB, and almost none that match the request chunk size (see below for the dump of the keys).
The very small chunks are probably the extra bits that are the result of splitting shards into chunks, but I’m more surprised by the un-even-ness of the other chunks, with some more than 10x the expected size.
I tried changing the chunk size, and the number of split points almost never change, I also tried waiting several minutes between inserting the data and querying, and the behavior is mostly the same.
Changing the parameters (number of items, size of the values, chunk size, etc…) and the result are always the same: the actual size of the split points is all over the place.
Is this something that other are seeing as well? Am I doing something wrong? Are my expectations of this API too strong? All the keys returned are valid keys, they all are contained within the range that I specified at the start, the first and last key match the bounds of the range, etc…
I looked at the implementation in StorageMetrics.actor.cpp
method getSplitPoints()
, and I see that it is probing the bytesample
index to get an approximate set of keys that are roughly spaced by chunkSize
bytes appart. Could this be because of a very uneven sampling of the keys by this actor ?
Here is the result of getting the split points with chunk size 12500 in a range of 50K keys with 50 bytes values:
Creating 50,000 keys (50 bytes per key) with 2,500,000 total bytes
Get split points for chunks of 12,500 bytes...
Found 109 split points
> (0,) .. (60,): 60 results, size(values) = 3,000 bytes, ratio = 24%
> (60,) .. (520,): 460 results, size(values) = 23,000 bytes, ratio = 184%
> (520,) .. (622,): 102 results, size(values) = 5,100 bytes, ratio = 41%
> (622,) .. (687,): 65 results, size(values) = 3,250 bytes, ratio = 26%
> (687,) .. (897,): 210 results, size(values) = 10,500 bytes, ratio = 84%
> (897,) .. (1346,): 449 results, size(values) = 22,450 bytes, ratio = 180%
> (1346,) .. (1351,): 5 results, size(values) = 250 bytes, ratio = 2%
> (1351,) .. (2456,): 1,105 results, size(values) = 55,250 bytes, ratio = 442%
> (2456,) .. (2805,): 349 results, size(values) = 17,450 bytes, ratio = 140%
> (2805,) .. (3559,): 754 results, size(values) = 37,700 bytes, ratio = 302%
> (3559,) .. (4274,): 715 results, size(values) = 35,750 bytes, ratio = 286%
> (4274,) .. (5977,): 1,703 results, size(values) = 85,150 bytes, ratio = 681%
> (5977,) .. (6094,): 117 results, size(values) = 5,850 bytes, ratio = 47%
> (6094,) .. (6693,): 599 results, size(values) = 29,950 bytes, ratio = 240%
> (6693,) .. (6708,): 15 results, size(values) = 750 bytes, ratio = 6%
> (6708,) .. (6819,): 111 results, size(values) = 5,550 bytes, ratio = 44%
> (6819,) .. (6868,): 49 results, size(values) = 2,450 bytes, ratio = 20%
> (6868,) .. (9922,): 3,054 results, size(values) = 152,700 bytes, ratio = 1,222%
> (9922,) .. (11082,): 1,160 results, size(values) = 58,000 bytes, ratio = 464%
> (11082,) .. (12572,): 1,490 results, size(values) = 74,500 bytes, ratio = 596%
> (12572,) .. (13086,): 514 results, size(values) = 25,700 bytes, ratio = 206%
> (13086,) .. (13299,): 213 results, size(values) = 10,650 bytes, ratio = 85%
> (13299,) .. (13815,): 516 results, size(values) = 25,800 bytes, ratio = 206%
> (13815,) .. (13839,): 24 results, size(values) = 1,200 bytes, ratio = 10%
> (13839,) .. (14022,): 183 results, size(values) = 9,150 bytes, ratio = 73%
> (14022,) .. (14179,): 157 results, size(values) = 7,850 bytes, ratio = 63%
> (14179,) .. (14233,): 54 results, size(values) = 2,700 bytes, ratio = 22%
> (14233,) .. (14646,): 413 results, size(values) = 20,650 bytes, ratio = 165%
> (14646,) .. (15661,): 1,015 results, size(values) = 50,750 bytes, ratio = 406%
> (15661,) .. (15986,): 325 results, size(values) = 16,250 bytes, ratio = 130%
> (15986,) .. (16112,): 126 results, size(values) = 6,300 bytes, ratio = 50%
> (16112,) .. (18461,): 2,349 results, size(values) = 117,450 bytes, ratio = 940%
> (18461,) .. (19400,): 939 results, size(values) = 46,950 bytes, ratio = 376%
> (19400,) .. (19629,): 229 results, size(values) = 11,450 bytes, ratio = 92%
> (19629,) .. (20873,): 1,244 results, size(values) = 62,200 bytes, ratio = 498%
> (20873,) .. (21241,): 368 results, size(values) = 18,400 bytes, ratio = 147%
> (21241,) .. (21784,): 543 results, size(values) = 27,150 bytes, ratio = 217%
> (21784,) .. (21989,): 205 results, size(values) = 10,250 bytes, ratio = 82%
> (21989,) .. (22076,): 87 results, size(values) = 4,350 bytes, ratio = 35%
> (22076,) .. (22237,): 161 results, size(values) = 8,050 bytes, ratio = 64%
> (22237,) .. (22281,): 44 results, size(values) = 2,200 bytes, ratio = 18%
> (22281,) .. (23339,): 1,058 results, size(values) = 52,900 bytes, ratio = 423%
> (23339,) .. (23416,): 77 results, size(values) = 3,850 bytes, ratio = 31%
> (23416,) .. (23515,): 99 results, size(values) = 4,950 bytes, ratio = 40%
> (23515,) .. (25773,): 2,258 results, size(values) = 112,900 bytes, ratio = 903%
> (25773,) .. (25901,): 128 results, size(values) = 6,400 bytes, ratio = 51%
> (25901,) .. (26070,): 169 results, size(values) = 8,450 bytes, ratio = 68%
> (26070,) .. (26161,): 91 results, size(values) = 4,550 bytes, ratio = 36%
> (26161,) .. (26261,): 100 results, size(values) = 5,000 bytes, ratio = 40%
> (26261,) .. (26857,): 596 results, size(values) = 29,800 bytes, ratio = 238%
> (26857,) .. (27342,): 485 results, size(values) = 24,250 bytes, ratio = 194%
> (27342,) .. (27390,): 48 results, size(values) = 2,400 bytes, ratio = 19%
> (27390,) .. (27479,): 89 results, size(values) = 4,450 bytes, ratio = 36%
> (27479,) .. (27540,): 61 results, size(values) = 3,050 bytes, ratio = 24%
> (27540,) .. (28001,): 461 results, size(values) = 23,050 bytes, ratio = 184%
> (28001,) .. (28129,): 128 results, size(values) = 6,400 bytes, ratio = 51%
> (28129,) .. (29549,): 1,420 results, size(values) = 71,000 bytes, ratio = 568%
> (29549,) .. (29810,): 261 results, size(values) = 13,050 bytes, ratio = 104%
> (29810,) .. (31272,): 1,462 results, size(values) = 73,100 bytes, ratio = 585%
> (31272,) .. (31301,): 29 results, size(values) = 1,450 bytes, ratio = 12%
> (31301,) .. (33260,): 1,959 results, size(values) = 97,950 bytes, ratio = 784%
> (33260,) .. (33264,): 4 results, size(values) = 200 bytes, ratio = 2%
> (33264,) .. (33698,): 434 results, size(values) = 21,700 bytes, ratio = 174%
> (33698,) .. (34110,): 412 results, size(values) = 20,600 bytes, ratio = 165%
> (34110,) .. (34274,): 164 results, size(values) = 8,200 bytes, ratio = 66%
> (34274,) .. (34600,): 326 results, size(values) = 16,300 bytes, ratio = 130%
> (34600,) .. (35616,): 1,016 results, size(values) = 50,800 bytes, ratio = 406%
> (35616,) .. (35927,): 311 results, size(values) = 15,550 bytes, ratio = 124%
> (35927,) .. (36060,): 133 results, size(values) = 6,650 bytes, ratio = 53%
> (36060,) .. (36339,): 279 results, size(values) = 13,950 bytes, ratio = 112%
> (36339,) .. (36355,): 16 results, size(values) = 800 bytes, ratio = 6%
> (36355,) .. (37135,): 780 results, size(values) = 39,000 bytes, ratio = 312%
> (37135,) .. (37705,): 570 results, size(values) = 28,500 bytes, ratio = 228%
> (37705,) .. (38056,): 351 results, size(values) = 17,550 bytes, ratio = 140%
> (38056,) .. (38527,): 471 results, size(values) = 23,550 bytes, ratio = 188%
> (38527,) .. (39289,): 762 results, size(values) = 38,100 bytes, ratio = 305%
> (39289,) .. (39381,): 92 results, size(values) = 4,600 bytes, ratio = 37%
> (39381,) .. (39793,): 412 results, size(values) = 20,600 bytes, ratio = 165%
> (39793,) .. (39882,): 89 results, size(values) = 4,450 bytes, ratio = 36%
> (39882,) .. (40029,): 147 results, size(values) = 7,350 bytes, ratio = 59%
> (40029,) .. (40266,): 237 results, size(values) = 11,850 bytes, ratio = 95%
> (40266,) .. (41001,): 735 results, size(values) = 36,750 bytes, ratio = 294%
> (41001,) .. (41222,): 221 results, size(values) = 11,050 bytes, ratio = 88%
> (41222,) .. (41233,): 11 results, size(values) = 550 bytes, ratio = 4%
> (41233,) .. (41361,): 128 results, size(values) = 6,400 bytes, ratio = 51%
> (41361,) .. (41521,): 160 results, size(values) = 8,000 bytes, ratio = 64%
> (41521,) .. (41846,): 325 results, size(values) = 16,250 bytes, ratio = 130%
> (41846,) .. (42649,): 803 results, size(values) = 40,150 bytes, ratio = 321%
> (42649,) .. (42745,): 96 results, size(values) = 4,800 bytes, ratio = 38%
> (42745,) .. (43556,): 811 results, size(values) = 40,550 bytes, ratio = 324%
> (43556,) .. (43731,): 175 results, size(values) = 8,750 bytes, ratio = 70%
> (43731,) .. (43881,): 150 results, size(values) = 7,500 bytes, ratio = 60%
> (43881,) .. (44189,): 308 results, size(values) = 15,400 bytes, ratio = 123%
> (44189,) .. (44468,): 279 results, size(values) = 13,950 bytes, ratio = 112%
> (44468,) .. (44504,): 36 results, size(values) = 1,800 bytes, ratio = 14%
> (44504,) .. (44633,): 129 results, size(values) = 6,450 bytes, ratio = 52%
> (44633,) .. (44643,): 10 results, size(values) = 500 bytes, ratio = 4%
> (44643,) .. (44836,): 193 results, size(values) = 9,650 bytes, ratio = 77%
> (44836,) .. (45380,): 544 results, size(values) = 27,200 bytes, ratio = 218%
> (45380,) .. (45596,): 216 results, size(values) = 10,800 bytes, ratio = 86%
> (45596,) .. (45721,): 125 results, size(values) = 6,250 bytes, ratio = 50%
> (45721,) .. (45916,): 195 results, size(values) = 9,750 bytes, ratio = 78%
> (45916,) .. (45964,): 48 results, size(values) = 2,400 bytes, ratio = 19%
> (45964,) .. (46728,): 764 results, size(values) = 38,200 bytes, ratio = 306%
> (46728,) .. (48372,): 1,644 results, size(values) = 82,200 bytes, ratio = 658%
> (48372,) .. (48826,): 454 results, size(values) = 22,700 bytes, ratio = 182%
> (48826,) .. (48873,): 47 results, size(values) = 2,350 bytes, ratio = 19%
> (48873,) .. (50000,): 1,127 results, size(values) = 56,350 bytes, ratio = 451%
Statistics: smallest = 200 bytes, largest = 152,700 bytes, average = 23,148 bytes