Show FDB: A scalable realtime text editor on top of foundationdb

(Seph Gentle) #1

Hi all!

I thought you might like to see something neat - I’m working on a realtime data processing pipeline / event sourcing system lately called statecraft. Over the last few days I’ve added foundationdb backend support.

That means I can seamlessly run my little collaborative text editing demo on top of a foundationdb cluster. Editor demo here!

The editor allows any document name after /edit in the URL and it creates documents on the fly. It has full reconnection support, and it’ll render whatever you add into markdown.

Apologies in advance if people have vandalised the page (there’s no access control) or if the server has gone down (there are more bugs - this is still POC territory).

Background & implementation

If you’re interested in some background on this, I worked on Google Wave back in 2011, and then when that got cancelled I reimplemented my own small version of a lot of the tech into ShareJS, which became ShareDB for arbitrary realtime document editing. Foundationdb was the first database I wanted to build sharedb on top of because of its rock-solid fundamentals. But then FDB disappeared and I was bullied by my company into using redis + mongodb instead.

Anyway, statecraft is in many ways the latest iteration of this long path. It supports … actually, I’ll hold that explanation for later. I’m very pleased with it though.

In terms of realtime support with FDB, it leans heavily on fdb’s versionstamps. Every operation is added to a queue (op/VERSIONSTAMP). The documents themselves is stored with the versionstamp prefixed at the front. The client knows the versionstamp of the document snapshot its looking at, and when it sends a change to the server I’m checking for conflicts per-key on top of FDB. (code) Multiple users can edit different documents concurrently with conflicts. But if two users do conflict, I’m reading out the operation log and doing a separate eventual consistency pass. (Code, though it is far from straight forward)

For realtime messaging, each write operation also writes into a big shared global version key (with conflicts turned off, though this might still be a bottleneck in big systems). Then each frontend server watches that global version key. Whenever it changes, it reads all client operations since that version and sends them to all connected listening clients.

Its a bit bad using a single global current version key. I expect it would be more efficient if I could just follow the raw fdb oplog directly, or watch the versionstamp some other way. With FDB the way it is now I could also instead make a whole set of ‘latest version’ keys scattered throughout the keyspace and have each frontend server just watch all of them and take the max value when they change. But, as I said, this is a proof-of-concept. Eventually I want to rewrite the backend into rust, though I’m hoping futures land in stable before that happens (and then that someone makes a rust FDB binding layer which adapts FDB futures into rust futures).

The current code also re-stores the whole text document with every edit, but this is just because I haven’t tuned it. It should store a collection of recent edits alongside each document, and then bake the edits back into the document itself in the database only when there’s enough of them that it makes sense. That way it would be ammortized O(1) with the size of the document.

Enjoy! And AMA!

(Amirouche) #2

Hello joseph!

This has nothing to do with things like kafka et al.?

What about CRDTs? I was under the impression that it was the gist of google wave. How do you solve the same problem CRDTs solved in google wave with FDB?

Do you use ‘watches’ with FDB?

(Seph Gentle) #3

Sort of - kafka is a system for storing and forwarding changes. Statecraft is more of a database - or, a database constructor kit. You design how data moves around, then you can query it like any other database (or if you set it up correctly, serve HTML or whatever straight from the data store).

Google wave didn’t use CRDTs as we know them today. The Logoot paper only came out in late 2009, well after development was underway for wave. And that only implemented a CRDT for plain text. Nobody had done anything like that for the rich text / XML-like structures that wave used yet. So we used ‘plain’ OT based on a paper from 1995. Google docs at the time didn’t even do that - its logic simply wasn’t correct and with unlucky network connection & disconnection, google docs at the time would start showing you incorrect document states.

Just like Riak, conflict resolution in statecraft is entirely up to the developer. The idea is that OT or CRDT or whatever can be implemented as a layer on top of your root store, configured however you like. I’ve implemented a layer for OT, but you should be able to use whatever conflict resolution system you want.

(Alex Miller) #4

I enjoyed vandalizing your demo, and look forward to vandalizing your future demos. :smiley:

I’m also really interested in hearing more on how realtime web things work out on FDB. I understand that it’s an important thing for web, and I’m interested in feedback on how useful/not useful watches and their semantics are for you.

Would you mind elaborating on a couple parts of your design:

  1. Why are you doing a global queue for all documents, rather than one queue of operations per document. Do multi-document transforms exist? It seems like one-queue-per-document would simplify/solve your scaling issues for realtime watches?

  2. What I remember reading of Wave’s design was that the OT side was largely used on the client as a way to rebase unsynced operations onto the current server’s state of the document. It appears that’s not what you’re doing, but a quick read through of the eventual consistency pass implies that there’s situations where you’ll throw parts of edits away. Does it not come out as odd from a UX perspective to have a server “accept” edits from a client, but conflicts from other users cause the edit to be thrown away once the document is synced back to the client? Has OT/CRDT theory advanced sufficiently since the days of wave that both maintaining sufficient document history to be able to merge transforms and do the merge in a reliable way are now feasible?

(Sounds like quite a journey you’ve had since your Google times… would you call it… a walkabout? :wink: )

(Seph Gentle) #5

:sweat_smile: that sure takes me back!

… In the process of replying to you I’m starting to rethink my design.

The one-queue-per-document model would make more sense if all I were doing with statecraft was to remake google wave, or a similar collaborative editor. And its exactly what I did with ShareJS / ShareDB.

But one of the big downsides to sharedb we ran into is that it doesn’t support multi-document transactions. I want statecraft to be useful for more of the stuff you use databases for, so multi-document txn support is important. Statecraft also supports range queries and range query subscriptions. So right now statecraft imagines all writes going into a single big transaction log, and frontend servers filter operations out of that based on the active document sets of their clients.

The current design will work fine for read-heavy workloads or workloads with a small active data set. But it won’t scale to a large, sparse set of keys (like google docs). But thinking about it now I might be able to have my cake and eat it too. I could leave the write path more or less alone, still supporting multi-key transactions. But then store the operation log of each key separately. Then the frontend servers could set up a whole set of watches, watching the union of all keys which have active subscriptions.

But I might run into active watch count limits. And you can’t watch a range. But I could get around both of those problems by reducing the set of keys to a pool of 10-100 separate op logs. Each frontend server could hash & truncate the active set of keys of its clients, then maintain a watch for each key in its active set of subscriptions. That would help distribute the load in a FDB cluster, and reduce the unnecessary work of receiving, decoding then filtering out operations in frontend servers. Range subscriptions would be trickier - I might need to make it configurable based on what you’re doing.

A document operation in statecraft is Set(newval), Remove() or some registered custom operation type. For that demo, all the typing operations are ot-text-unicode operations. If all operations are text operations, no data will be lost. But there’s a weird situation that arises when one user does a set or remove on the entire document, and another user concurrently edits the document’s contents using OT. This sort of thing is impossible in that demo but it could happen, for example, if you wanted to run some sort of schema migration. Eventually I want the default operation type to be json1 OT operations, and then schema migrations could also be resolved using operational transform. Its also detectable in the client (since the conflict error propogates all the way up). So if the app authors wanted to resolve it differently, they could - “Looks like the document you were editing was deleted! Want to save your recent edits?”