Registering new user defined typecodes in tuple layer

Documentation of tuple layer allows users to define User type codes, using which one can extend tuple layer to encode/decode user defined classes.

To utilize this, one way is to

  1. check-in foundationdb tuple layer code in application codebase
  2. add serialization logic of user defined class in encode method
  3. add de-seariaization logic of user defined class in decode method

Is this the right way of extending tuple layer with user defined types?

Or, is there some way to “register” user defined types (with serialization logic) in tuple layer on startup, which can then be used on runtime?
This will avoid checking-in tuple layer code in application codebase, which means lesser maintenance and easy dependency upgrades.

Note: I am not not adding any Tuple::*_CODE in Tuple.cpp here.

I’ve given this some amount of thought in the past, and I think this more or less summarizes the option.

As it is currently, checking in the FDB tuple code and then hacking it up oneself is the only way of doing this. (Perhaps @KrzysFR will remember doing something along those lines back in the day.) And then one will have to do essentially what you laid out: add a custom serializer for the type and a custom deserializer.

However, I think a registry interface is something that could and perhaps should be added if it is decided that this is a feature worth having. (At the moment, then, the type codes are essentially aspirational.)

I think you want in that registry is a way of adding custom encode and decode methods for a given type and type code. Unfortunately, the particulars would need to be very language specific. The API would probably be something like, “implement an interface and then add an instance of that interface to a registry”. What exactly that interface looks like would require a little bit of thought, but it would definitely include an encode method (that takes an object of the given type and returns a byte string or writes to the end of an existing byte buffer), a decode method (that takes a byte string or an offset to some byte buffer and then produces an object and specifies the next offset), possibly an “encoded size” method (that takes an object and tells you how long to make the object, which is used by the Java bindings and could be used by other objects), and possibly a “get type code” or “get type” methods (used to insert the item into the registry in the right place, or possibly the “register” method takes both an implementation of the interface, a type code, and a type; alternatively, the implementation could have “can encode” and “can decode” methods that take the type code and the object, and then the “right one” is selected from all the encoders/decoders (note that this is the most flexible, but it does not allow you to determine ahead of time if there are multiple of these tuple encoder objects over the same types/type codes)).

But if this were to be added, some of these design decisions should be decided on (as there are a couple of forks above, somewhat implicitly). I’d also be somewhat concerned about API stability of this feature, as we’ve updated some of the internals of how encoding and decoding works (often for efficiency reasons). But that’s the high level overview of the thoughts I’ve had so far.

Thanks, I have started hacking it already.

Yes

Thanks for the detailed design considerations.

If you think its worth it, I am more than happy to create a github issue for this with link of your post. I will track that issue and move my codebase to utilize such API as and when available.

For now, I will hack it.

I implemented support for user defined types in the tuple layer of the .NET binding, but never really used it, probably because I’m using a statically typed languages where the caller is providing the exact type to deserialize, while in dynamic typed languages, the implementation of the tuple layer usually creates the types by itself.

By that I mean that I will probably call TuPack.Decode<int, long, double, string, SomeEnum>(...) giving the exact type I want. This allows me to also define an interface ITupleSerializable on custom types, or use dependency injection to inject deserializers for custom types. C# now also has better support for tuples and type deconstruction than in the early days, which would make this even easier.

In practice, I’ve never liked custom types that much, because they pollute the types themselves with custom serialization methods (what about if they also need to serialize to JSON, XML, or some other library), and also ties them to my particlar tuple implementation (there could be other libraries).

This also pollutes the application setup, and can be difficult when composing multiple components together (either collision on the type id, or need to call multiple “init” methods that have to hook their serializers).

And finally, this makes it impossible for tools or applications written in other languages to decode the content of the keys making it difficult to diagnose.

For simple custom types (composed of a few fields with basic types), I would prefer using embedded tuples.

For complex types I’d probably use a custom binary encoding anyway, which would be more compact than the tuple encoding.

Note that this is from the perspective of static languages. Things are probably dramatically different for dynamic languages. An example of that is the different approach taken to serialize versionstamps in tuples.