I’m currently using RethinkDB, but watching FoundationDB with great interest. I would like to ask about a use case involving changefeeds, which I think are a fantastic (but poorly understood) feature of RethinkDB. From what I can tell, the following cannot currently be achieved with FoundationDB:
get a range of keys and set a watch on that range in a single transaction
receive updated values when any of the watched keys is changed or when a new key is added in the range
RethinkDB seems to be the only database that can do this today. The use case is:
when a user logs in, set up a changefeed on that user’s data (in RethinkDB this can be any query, but I’m simply using a secondary index range),
the changefeed gets all initial user data, gathering changes that happen during the initial download
after the initial data load, get subsequent changes (including those that happened during the initial download)
provide real-time updates in the user interface
To narrow things down, I don’t need to get all the updates, if a key is updated several times, I’m fine with getting just the latest value. But I can’t afford to miss changes, as that would leave the client UI out of sync.
From what I understood from FoundationDB docs, only watches on single keys are supported, and even then I am not guaranteed not to miss changes (there is an ongoing discussion in another thread in the forums). I can’t see any reason why what I described would not be possible fundamentally — but I might be missing something.
Is this something that is (or will be) doable with FoundationDB?
FoundationDB doesn’t support setting watches on ranges, but as a client there are schemes you can setup to monitor changes to a range. In the example where you want to setup a change feed on a user’s data, you could add a key that gets updated by the client every time an update is made to that user’s data. Then you can set a watch on that key to know when the user’s data is modified.
With this, you won’t know anything about the keys modified or their new values. To learn that, you could read the whole range and compare it with a version of the range that you’ve cached, or you could store a record of changes in the database and read that to know what’s new. If, for example, your watched key contains a versionstamped value, then you can also keep a list of modifications with versionstamped keys, and when your watch fires you can then read all new entries in your modification list with a versionstamp larger than your current versionstamp. In this scheme, you may need to have some way to clean up your modification history when it’s no longer needed.
Of course, to simplify the process of setting something like this up and using it, you could provide the ability to watch change feeds as part of a layer. The complexity of the above would be managed by the layer while giving the user a simplified API that provides the features you need.
With respect to missing updates, the only type of update that a watch may miss is an ABA update (i.e. value=A when the watch is started, changes to B, and then back to A). If you don’t care about intermediate values but only care about the latest value, then this may not be an issue for you. In the scheme I described above where the value uses versionstamps, the value will monotonically increase and you can’t get ABA updates.
I somewhat remember that thread, but I can’t find it. In any case, if you transactionally read a key and transactionally set a watch on that key, then the watch won’t be registered with the database until commit time but it is set to fire whenever a write is detected following the read.
The way this works is that as all FDB reads are versioned, the read is associated with some read version v. When a watch comes in, it is also associated with a version, in this case, the same version v. If there is a concurrent write to that key during the transaction’s lifetime (i.e., between start time and commit time), then the most-recently-written version of that key must (by necessity) be greater than v, so the watch will fire. If there is no such write, when the watch gets to the database, it will see that there hasn’t been a change yet and wait for one (firing then).
But the key point here is that reading the key and setting the watch happen transactionally. If you do something like put a watch on some key and trigger a long running job when it completes and then set a new watch after the job has ended, you might miss any update that happens between your job beginning and ending. To close the gap there, you will need to verify when setting the watch that the key hasn’t changed between job start and end time. If it has changed, then you would need to short circuit then and do whatever you do with a change. If it hasn’t, then you can set the watch.
I will also add that if you go with something like keeping a change-log in the database, then you can actually notice these kinds of changes with some caveats.
There are basically two ways to structure such a log. One is to keep track in one keyspace of the mapping from key to most-recently-modified version and another index from version to key. Then to get all updates, you read everything from the last time you’ve read to the end of the version to key index. To modify a key, you atomically update (1) the key itself, (2) its most-recently-modified version (or this can be kept as part of the value associated with the key), (3) remove the old entry in the version to key mapping, and (4) add the new entry to that index. Then only the most recent updates are in the index, so getting the most up-to-date values of all of the keys is fast if that’s all you care about. (In particular, if there is a hot key that is updated a bunch, despite all of the writes, you will see it only once in the index.) On deletes, you might need to do something like write a tombstone value into the key so that readers know that whatever the key was representing has been deleted (and not just not updated), and at some point, you should probably clean out the tombstones.
The other way is to keep instead an index from version to some kind of mutation representation (which is closer to a more traditional log). Then clients can munge through this mutation log to get updates instead. This gets all intermediate values if that’s what’s necessary, but it requires log rolling to avoid the log getting too big, which is another thing the client logic would have to handle.
Just to avoid any confusion, my note about ABA updates was in reference to the key that actually had a watch set on it. Our watches may not fire in the presence of an ABA update on the watched key. In the scheme above, though, the watched key is always changed with an increasing version, so it avoids the ABA update behavior.
Alec’s comments are in regard to detecting ABA updates on the keys that you are actually interested in detecting changes on. In this case, that would be the range that we are keeping a change-log on. His advice applies to detecting changes (including ABA changes) to any of the keys in the target range.
I’m afraid that this thread, like the previous one about watches, might leave the reader with an impression that they are a lot harder to use correctly than (I think) they actually are.
The basic idea of a watch is that you have something that you could monitor with a polling loop:
return tr["light_switch"] == "on"
state = get_light_switch_state(db)
set_lamp_state( state )
But this will either be slow to react (if the sleep is long) or use lots of resources (if the sleep is short). So you can replace the sleep with a watch:
return tr["light_switch"] == "on", tr.watch( "light_switch" )
state, watch = get_light_switch_state(db)
set_lamp_state( state )
and for the most part it will “poll” only when the state changes, but pretty quickly when it does, so you get the best of both worlds. Watches should be totally reliable for anything that polling would in principle work for.
Of course, if you change the light switch state and then change it back quickly, either version can “miss” that “ABA” update. Neither of these methods should be used when the goal is to produce a log of updates. The preferred approach to do the latter with FDB is to maintain such a log transactionally when doing the updates (as @alloc explains above). FDB also has the ability to log all updates to a given key range (this capability is used by the backup and DR tooling), but this facility is somewhat dispreferred and underdocumented because there is no way to provide FDB’s excellent backward compatibility to applications dependent on the format of these logs. They are likely to need updating to work correctly with new major versions of the database, and that’s a significant disadvantage.
Now to the original question: how can a layer or application best implement reactive features using FDB, where the data to be monitored for changes is not limited to a single key? I think the most attractive design is to maintain an index of what is currently being watched, and at update time only keep track of changes for the watched things. Depending on your data model and use case, you could make various decisions about what granularity to track each of these things at. For your requirements, you would probably only track which keys have changed since the watch was created, rather than keeping a log, but the latter option is available for use cases that need it.
Are there any features at the FDB level that would make this sort of thing more efficient? Range watches analogous to our single key watches are feasible, and would provide a nice interface in many cases closely analogous to the light switch example I give above, but when they fired you would have to read the entire range to find out what was changed, which makes them generally less attractive than single key watches. I think they would be unlikely to be the most efficient implementation of what you want, and they would come at some performance cost (the data structures for range watches on the storage server being somewhat slower than for individual keys). So I’m open to suggestions, but my first guess is that this is a very desirable layer feature but that the FDB API already offers the necessary low level tools.
Thanks for all the answers. I’m quiet, not because I’m ignoring the topic, but because I need to process the information and think about this carefully. I’m not new to distributed systems, which is why the thought of implementing my own (correct) changefeed system scares me.
I didn’t know about versionstamped keys (they don’t seem to be mentioned in the guides, only deeper in the API docs) — I can see how they can be extremely useful, but I need to start trying things out to understand exactly how.
I’m slightly curious why there are no watches on key ranges, though. It seems the basic functionality is already there, because that is exactly what transactions do: watch for changes to a key range.
There’s a few pieces to the answer to your question, and let’s start off with single key watches before expanding out into range watches.
The contract of a watch is that if the value of the key being watched permanently changes from the supplied value when watch is called at a version later than when the watch was set, then the watch will be triggered. What you appear to be asking for (or expecting) is a contract that if the value is modified, then the watch fires. The difference in implementation between these two is if version history is required.
Watches become complicated in what to do if a client goes away for a while, and then tries to resume its watch. If the system doesn’t know if any changes occurred on the watched key in the version range the client is interested in, then it has two options: trigger the watch or not. The former leads to systems that promise watches “will notify you if one or more changes might have happened”, versus the latter, where FDB is trying to promise “will notify you when a change definitely has happened”. FDB can get into a situation where it doesn’t know if a change happened between when a client started the watch and now because it only maintains 5 seconds of version history. It appears that RethinkDB’s change feeds are an example of the former, as if you’re “connected” then you get streamed notifications, but they make no guarantees of delivery. FDB watches guarantee delivery in a sense, as under failure, the watch will retry, and “eventually” you’ll be notified that a modification happened.
Suppose that we had a storage engine that could maintain, for any key, the most recent version at which it was modified. Then we would be able to support a watch of the form “notify me when key X was modified at a version later than Y”, as at any point we’d be able to know if a modification happened in that range by reading the key. Extending the watch to a range still presents a complication: knowing if a key was modified between a starting version and now for a client that’s missed version updates means a range read of the entire range, to see if there’s any key that’s been updated more recently. That’d be a somewhat expensive operation to be happening transparently, possibly by multiple clients, and particularly for watches that span a lot of data.
I do think it’s worth revisiting our watch API, particularly if we ever gain a natively MVCC storage engine. It’s possible that we’d downgrade our watch API’s contract to allow triggering the watch even if we’re unsure that any change happened, but I kind of like the current “it either happened or it didn’t” API…
If the client goes away (disconnects), I fully expect to have to re-initialize the watch. I do not expect to be able to ask for changes which happened since a specific version, only since I established the watch.
What I’m successfully using in RethinkDB right now are changefeeds of this type:
This is a changefeed on all keys in a secondary index range (for a specific username). It sets up a point in “time” when an initial snapshot is taken (as in the beginning of a transaction) and the “watch” established. I then get an initial data dump, and subsequent changes. The changes are “squashed”, which means I might not get all of the changes to a particular document (value under the key), just the latest one. Notice that versions do not figure into this. If I disconnect, I have to set up another changefeed, thus re-downloading all the initial values and re-establishing the “watch”.
Having read your explanation, I think I can see a problem with implementing this: after a value changes, the watch is “done”. It would need to be re-established automatically and atomically, or not disappear at all. But here again, I don’t see versioning as necessary: I am interested in changes since I established the watch.
So, I am not sure I follow everything you wrote: I do want to get notifications “when a change definitely has happened”, but only for those changes that happened since I established the watch. I don’t think this is different from monitoring if certain keys have changed in a transaction. In a way, one could think of changefeeds as “indefinitely open transactions”.