This is something I’ve wanted to do for a while.
@andrew.noyes A full prefix check on every consecutive key pair of every set of keys costs too much CPU, but there are better ways.
If a set of keys (or things containing keys) is sorted (and in most cases in FDB this is true), there is a very cheap way to get much of the benefit for very little send-side CPU. You can compare the first key to the last key to get a common prefix to factor out, and the message format becomes roughly
numItems | commonPrefixLen | commonPrefixBytes | [itemsWithKeySuffixes...]
If the set is not sorted, then
CommonPrefixLen is 0 by default, and on the decoding side that case can be shortcut to not do any concatenation to restore the keys when decompression is requested.
If the creator of the set knows that although the set isn’t sorted it does have a common prefix, then the common prefix could be initialized explicitly.
For a sorted set, there is also a way to produce the full list of pair-wise prefix lengths by starting with the first-last common prefix and bisecting the set recursively, passing each enclosing set’s common prefix as a starting point for prefix detection in its subsets. Redwood essentially does this to generate its prefix compressed binary trees.