Getting a realtime feed of changes

I care a lot about realtime editing and I think all application should default to showing a ‘live’ view of the data you’re looking at. Right now this is really hard to do on top of foundationdb, and I have some ideas on how to fix it but I’d be interested in some other people’s thoughts.

As an example, I want to be able to write a tool which can show you a (live) view of all the directories in foundationdb and let a user drill in and see the documents inside each fdb directory. And I want that view to update live as documents are inserted, removed, or updated. And ideally, I’d like to be able to implement this sort of thing without changing application code (or the directory layer) - because the directory layer is pretty complicated.

You could see live changes to the directory layer if you had a feature to watch ranges of keys. But ideally I’d also like to be able to also associate metadata with those changes - like, information about the user / service which made that change, so any clients which are viewing the change can show that.

I could do this in a layer which sits under the API that things like the directory layer consume. I’m imagining something like this:

  • We specify a directory (eventually, ideally it should be moved to \xff\xff or something). This directory contains documents which specify watched key ranges.
  • For each watched range, we have a configuration document that specifies some properties - a prefix for log entries themselves and expiry times.
  • Every time a client commits a transaction, if that transaction modifies any documents inside a watched range, the commit will also append a log entry for that watched range listing the keys that were changed with associated metadata (if any).
  • I’m imagining two kinds of watched ranges: ranges which are per-client (and should be deleted automatically if the client goes away), and watched ranges that are expected to be shared by multiple clients. (For these, we can specify a ttl on the log entries themselves - so the log is a sliding window of the last 10 minutes of changes or something). Not sure if the TTL should be specified as a time, or specified using version numbers.
  • If the expiry time on any log elapses, any client can delete (garbage collect) it.

For per-client range watches, the expiry time on the log itself is there so if the client crashes, we don’t end up with a buildup of crap in the database. Each client can periodically advance the expiry time of all their own range watches to keep them active. And clients should delete log entries as they consume them.

A naive implementation of this would need to fetch the configuration information in each commit - which is pretty slow. But we can do better than that by having all clients watch the configuration entries using this very feature. To make that work, we set up a global log for watch configuration changes. When a client starts up it fetches the watch configuration and stores that locally, and keeps that local configuration object up to date.


I’ve described this as a change that could be made inside the bindings, because I maintain the nodejs bindings, so I can easily change that code. But I’m not sure if thats actually the best place. Implemented as I’ve described it, this change would need to be implemented independently in all the bindings (which is a hassle). If any clients weren’t inserting log entries, it would mess the whole thing up. And to actually notify consumers without polling, each watch range would need a single ‘current version’ document or something that other clients can watch. And that would quickly become a bottleneck at scale.

Also the server already has & maintains a log of operations for replica sets. Its just currently private.

Anyway, I’m interested in people’s thoughts on this. I might code up a little proof of concept in the node bindings and goof around with it.

And does this feature interest anyone else?

4 Likes