I did a bit of thinking on the way back from FDB Summit about how to implement predicate pushdown at the storage process layer.
Today the only option is to check the layout of the keyspace and send requests outside of FDB to a layer process running on the same machine. This limits you to only the data stored on that machine, which is useful for some applications but probably not all. It also requires every layer to implement that router and correctly handle failures that are already handled by FDB.
My idea for implementation is to extend the
transaction_get_range API to add the ability to send arbitrary predicate data and an arbitrary filtering process name as a part of the
transaction_get_range API call
FDBFuture* fdb_transaction_get_range_predicate( FDBTransaction* transaction, uint8_t const* begin_key_name, int begin_key_name_length, fdb_bool_t begin_or_equal, int begin_offset, uint8_t const* end_key_name, int end_key_name_length, fdb_bool_t end_or_equal, int end_offset, uint8_t const** predicate_filter_names, uint8_t const** predicates, int limit, int target_bytes, FDBStreamingMode mode, int iteration, fdb_bool_t snapshot, fdb_bool_t reverse )
foundationdb.conf, you would specify predicate filter processes like this
[predicate_filters.FILTER_NAME] command = /usr/bin/... port = 7654
FILTER_NAME is the value used in
uint8_t const** predicate_filter_names. This would allow multiple different filter processes to live on the same machine at different ports.
Some protocol would be established to send each KV pair to the filter process, along with the data from the corresponding
uint8_t const** predicates index. It would parse the predicate in whatever way it wants to for the layer’s goal (regex, comparing numbers, etc.) and return
false to indicate whether the data should be filtered from the storage server response.
A new client library and set of bindings could be developed to make this part easier on developers. From there, they would just implement the parser for their layer’s predicate payload and execute the logic to filter based that payload.
How this interacts with r/w conflict ranges
Conflict checking would remain the same, this is just a performance optimization. Any data not returned to the client is still assumed to be read during the transaction.
Downsides to this approach
- Requires work from layer developers
- Copying data around internally more than before
- More syscalls moving data around than if it were in-process
Benefits to this approach
- Doesn’t allow CPU-hungry filters to block the Flow thread
- Allows arbitrary filters to be implemented
- Allows developers to use any language
Did I miss anything critical that would prevent this from moving forward? Has there been any discussion in the past on how best to implement this?