I am working on a use-case where it will be helpful if I could apply some bit-level operations on the VERSIONSTAMP_KEY (or VERSIONSTAMP_VALUE) placeholder bytes in my key/value (keeping the output length unchanged to make these operations simpler for fdb).
The operations I have in mind are things like reverse_bits(), left_shift and right_shift, mask etc. only on the version_stamp portions of the bytes. Or to generalize, a function that takes 8 (or 10, if batching bytes can be made available) bytes as input and returns back 8 bytes post-transformation.
This opens up the possibility to come up with very creative uses of already versatile SET_VERSIONSTAMPED_KEY/SET_VERSIONSTAMPED_VALUE options! Note that these are not just optimizations (like predicate pushdown), but rather create options that were not at all possible earlier.
Do you think there is anything that is fundamentally wrong with this suggestion/approach? I believe that all the keys and values are completely filled up before giving them to resolver; so in this case, the proxy would apply these transformations when filling up the version_stamp in key or value and from that point on there should be no other change that would be needed.
If the above thinking makes sense, then how would one “pass” these transformations to fdbserver process? I am not suggesting these to be completely dynamic transformations that are decided at run-time; if there was a possibility of an interface/contract that one could implement, compile and provide to fdbserver as dynamically linked lib, and then mention the identifier of the transformation as part of key_value operation, would this be doable? We can even prohibit passing any runtime parameters to these transformations if it makes things more tractable. I am not at all good with C++ concepts related to this aspect, and hence doing a lot of hand-waving
I guess the more general question would be: can we have a user-provided function that modifies a mutation in the proxy at commit time (and it will get some additional input from the proxy like the write version). Do I understand your question correctly?
If so, I think this is probably hard to implement and the problem is kind of related to predicate pushdown (as in both of them need some form of function push down functionality). We currently simply don’t have a way of pushing a user-function to a server. Sending a shared-object file will probably meet a lot of resistance for several reasons:
Security. This is probably the biggest one. We don’t want clients to be able to inject random code into the server.
Robustness: If your user-function is buggy, we don’t want to crash the server.
Portability: If your client runs on MacOS and your server runs on Linux, you will need to be able to cross compile (or give up this feature).
I guess the good news is that we want to find a solution that would allow for stuff like that. There are several ways this can be done in a safe way:
Integrate some kind of scripting language like JavaScript, Scheme, Lua… These can often be run in a sandbox which adds safety. But obviously speed could be a concern.
It might be possible to use LLVM to do this. This will, however, add significant complexity to FDB (and testing).
The Linux Kernel implements EBPF to do these kind of things, so @alexmiller had the idea to integrate this byte-code language into fdb. I don’t know how feasible this is but if that works it would probably be my favorite choice.
But no matter what, it will take a while to get something like this implemented in a reliable way.
A maybe less general approach would be to allow to nest atomic operations and define some more atomic operations. The client could provide some form of template that the server would execute. This would not be as general as pushing down a user-defined function but I guess it would solve many of the use-cases you are thinking about.
Yes, that is correct. However, I wanted to restrict this slightly to eschew the security and implementation complexity issues that you mentioned. I was requesting to be able to implement a set of fixed methods that follow some predefined fdb contract, and directly put them alongside fdbserver and load them in fdb process when bringing it up. Think of it as few more named operations like set_versionstamp_key, that are implemented by fdb users. Clients only get to choose the id of the predefined operation (and optionally some encoded params in key or value) at commit time.
I also tried to restrict the request to not get into any kind of filtering/predicate push down operations, as they have been discussed in detail earlier, and have their own set of challenges related to interfering with fdb caching.
Yes this is closer to what I was hoping for, for now. With one change - if users of fdb can implement these, as per some predefined contracts, and provide these to fdbserver as dynamic libraries when starting up.
Ah I see, I misunderstood. I though you wanted to send shared-objects to the server, but what you really want is a plug in system.
Basically what we could do relatively easily is to allow for users to implement their own atomic function in c libraries that can either be preloaded or accessed through dlopen.
The main problem I see with this is that it adds complexity that we would need to support. It is also unclear to me how the interface to access these would look like and how we could make this maintainable over a long period of time. This is why I would prefer to wait for a more general solution (which would probably be done together with predicate pushdown if we do that).
Can you clarify a bit what the use-case would be that you have in mind?
I am trying to come up with a scheme where I can use the commit_version of the keys (keys are ordered by version_stamp) for deriving a bucket in which that key should be placed. This would help in distributing the keys in the key-space, to avoid write hot-spots. This would also lt me `stitch’ back the keys in proper order from different buckets, with acceptable overhead.
Something like change_log/bucket(version_stamp, width)/version_stamp/... -> ....
The function bucket(version, width) := Long.reverse(version >>> width), where width determines how fast buckets are changed (width=2^26 to change buckets every ~64 seconds, under an assumption of default version number generation speed). In the above example, width will be fixed and need to be passed in every time.
At the moment I am using some round-about way to achieve something closer to above - that results in complex codes and conflicts.