In my journey to use FDB with JVM based big data tools, one common problem that I find is that Java classes don’t extend ‘serializable’ and it causes both Spark and Flink to through an exception as they cannot distribute the code over the wire to the workers.
Is there any specific reason that Java bindings are not extending ‘Serializable’? Is it possible to have them all serializable?
Which classes are you thinking of, in particular? Many of the classes (for example, FDBTransaction, which implements the Transaction interface, as well as DatabaseOptions referenced in the issue) are actually just thin shells with little JVM state except for a pointer to objects in the native client, and so a proper serialization would need to also include the C++ objects as well.
There may be some that could theoretically be Serializable, though I’m not sure we’ve thought about, for example, which fields need to be transient, etc.
I think just extending Serializable would be a great start.
I understand that FDB doesn’t want to get into direct incorporation with analytical tools but for a company to decide if they want to use FDB or not it’s important to know that it’s at least not very hard to do those works themselves.
It would be cool side repo if FDB could have a test repo that contains simple examples of how to incorporate the database with the heavily used analytical frameworks like Spark and Flink. It could help everyone to understand what is needed to include in the main project bindings like the level of serializability, async frameworks, etc.