It looks like all the bindings currently make use of the same core C++ code including Java (via JNI) and Go (via cgo).
What magnitude of effort would be required to create an entirely separate implementation of the wire protocol? Would you ever consider documenting that protocol in order to make such a task more feasible?
One of the FoundationDB co-founders who goes as “voidman” [edit: Dave Scherer] on HN had this to say:
The client is complex and needs very sophisticated testing, so there is only one implementation. All the language bindings use the C library.
which is somewhat discouraging.
FoundationDB doesn’t have a strictly defined stable wire protocol, it’s whatever the structs are defined as in the source code, and this changes from version to version. This instability alone would make maintaining a separate client, even one written in C++, rather difficult.
Even with a stable wire protocol, implementing a client library from scratch would be a bit more complex than serializing the relatively simple get/set/commit API that’s in the client library. For example, the read-your-writes cache and maintaining the cache of what shards of data are located on what servers is maintained in the client. This adds additional surface area that would need to be tested, and which benefits from being included in FDB’s fault-injected simulation testing.
What Alex said. Is there a particular reason that you’re interested in writing your own client? The C interface is very stable, and designed to be easily wrapped by other language bindings.
Well, for languages that aren’t in the broader C ecosystem (Ruby, Python, etc) it’s often not ideal to use C bindings.
For instance, in Go (what I’m most familiar with) using cgo comes with significant downsides:
- There’s an unavoidable performance cost to crossing the c/go boundary.
- It’s harder to compile and cross-compile cgo-using code, and Go’s normally-super-fast build times suffer. Without cgo, it’s easy and normal to build and deploy static binaries; with cgo that gets much harder.
- Debugging and performance tools in the Go ecosystem typically don’t give insight into the cgo parts of the code.
- Writing correct cgo-using code requires expertise in both C and Go. In fact, it requires some understanding of the Go implementation like using unsafe (and cgo is itself unsafe, of course).
In Java-land, the tradeoffs might be different but the story is similar. There’s a reason that “pure-Go” and “pure-Java” are considered features.
There’s also some appeal to avoiding memory-unsafe dependencies if you’re in a memory-safe language (this also applies to Rust which – as I understand it – integrates more naturally with C code).
Anyway, using cgo isn’t a total showstopper for my use cases, but it’s something I would like to avoid long-term – hence this topic.
Thanks for the info – it certainly answers my question.
I think the way you get closest to what you want, without creating an enormous long term maintenance burden, is to first build a good, stable API for FoundationDB clients in whatever the best widely available async RPC protocol is. You can connect to it with basically no client at all and (at the cost of an extra network round trip in latency) use FDB.
Then you write, in your favorite language, a wrapper on that that uses the locality API to attempt to route read requests, in particular, to the physical machine that actually has the data in storage, so that the extra network latency goes away. This latter step will still be tricky to get right, even if the RPC API is designed with this optimization in mind, because the client will need to keep the read-your-writes state if it isn’t sending all the operations in a transaction to the same server. So you will still want to integrate with FoundationDB’s simulator, which will be hard for some languages. But in principle I think a client built along these lines could be only moderately slower than the native client and only slightly awful to maintain
If someone was doing this successfully, I think I would support building the RPC API into fdbserver, so that there doesn’t need to be an extra service floating around and so that read requests sent to the right server can be swiftly routed internally without more serialization and transport overhead.
Why does the wire protocol keep changing if the API has stayed so compatible between versions?
I will give one example of the wire protocol changing recently that might highlight some of the complexities of the client, and why the wire protocol can change even though the client API does not.
The clients keep a cache of which storage servers are responsible for which ranges or keys. This mapping is stored in the system keyspace of the database (keys beginning with the \xff byte), and if a client attempts to read a key and does not know the responsible storage servers it reads the information from the database just like any other key value pair. However there is a boot-strapping problem, because to read the mapping you need to know which storage servers contain the mapping. To solve this problem the clients can ask the proxies for the storage servers responsible for the system keyspace.
The solution I just described relies on the fact that one set of storage servers is responsible for the entire system keyspace. However, there are a number of performance reasons why we wanted to split of the system keyspace, and automatically shard the data just like the rest of the database. Since the previous design no longer worked once we split the system keyspace, we changed the protocol so that clients directly ask the proxy for the storage servers responsible for a range of keys, instead of reading that from the database.
From the client’s perspective nothing has changed, but under the covers the caching mechanism was completely replaced. The changes I just described happened in the most recent 5.1 release, and I can think of many other examples.
FWIW. We have been using the C bindings in Java for years without issues. The only gripe is the invisibility of native memory to the JVM (you have to budget that in when sizing VMs/containers).
The newer memory management model makes native bloat a thing of the past (although you basically have to manage it yourself) but we actually developed our own reference counting mechanism that releases memory more aggressively (I know we can use try resource blocks but there are just too many call-sites to fix).
Also, the fdb client is a very different “beast” compared to most clients (e.g. mySQL, HBase, Zookeeper, Cassandra). It almost feels like it’s part of the cluster (it’s just not responsible for persistence).
Just saw in the Roadmap Summit slide about plans to support a wire protocol in the future. Would be nice if the authors can offer a glimpse about the motivations behind it along with the design being considered.
An initial design document for this feature will be coming to a public location (our GitHub wiki, possibly) in the future prior to the feature being implemented. Please stay tuned, and we look forward to your feedback!