Hello,
I spent some time since the FoundationDB open sourcing announcement last week putting together the first, crude version of a FoundationDB storage adapter for the JanusGraph graph database. JanusGraph is LinuxFoundation hosted fork of TitanDB. We’ll be evaluating it to see if FDB a good fit for the JanusGraph access patterns/use cases but on the face of it, Janus is lacking a good, distributed ACID option for folks, and I think FDB is quite compelling in this department. I plan on picking the experts brains here as I get a bit further along to make sure my implementation/data modeling is sympathetic with FDB best practices.
In the meantime, if anyone is interested in living dangerously, the work in progress can be found here.
Before the Apple acquisition, a FoundationDB employee had been working on a Titan storage adapter for FDB. As I recall, the trouble we ran into was that Titan’s execution engine wasn’t architected in a way that allowed for easy pipelining of reads. This in turn made it tricky to get satisfactory latency from a distributed storage implementation.
I’d be really curious to hear whether this issue has been fixed, and if not then whether you figure out a clever way to work around it. Please do let us know if we can be helpful with any of this!
This sounds like a terrific idea! I’m very glad to see projects like this. Please keep us up to date on where you get with this…I’m sure lots of folks on the forums will be happy to give advice
The first thing that came to mind as a pitfall is pretty much exactly what Will said: the simplest way to satisfy a read API will be to create a new transaction for every read (or do the reads purely sequentially). This probably will work, but will may leave you wanting more speed. Two good practices in general for any use of the key value API are:
Each transaction is doing a good number of reads. Transactions need to do the “get read version” step at the start, and it’s best if that cost is amortized over many reads, each of which is very cheap.
Lots of reads are “outstanding” at the same time. This will almost always take the form of lots of reads issued on a Transaction (either get() or getRange()) and the resulting Future objects put into some container. If the code creates many Futures before calling get() on any one of the them, the database will process them all in parallel.
We now have the ability to issue parallel, asynchronous read requests against our storage layer of choice within the context of a single transaction. This isn’t on by default at this point, but I think we’ll have it at a good spot fairly soon where it will be. As you all have pointed out, this greatly helps the death by a thousand sequential reads situation that arises if we lazily execute the I/O as we traverse the graph. I’ll continue to refine the first cut implementation and report back with progress, updates, and questions.