FoundationDB

Watchers and not missing changes


(Rowan Goemans) #1

Hello everyone,

I’m considering FoundationDB as one of the key features that I need is the watches functionality. I have multiple servers that are watching for a change and then need to do something. According tot he documentation I read here: https://apple.github.io/foundationdb/developer-guide.html#watches The watches may not be fired for each change that has occured which is fine for my application. However the code suggest that it is possible to not have the latest version of a value when watching because this can occur.

  1. Start watching key
  2. Someone makes a change
  3. Watcher is notified and watcher reads the value
  4. Someone makes another change
  5. Start watching key

There is a gap between watching a key, reading the change then starting a watcher again and if a change occurs between those points I don’t think it is caught.

Am I correct in this assesment or does the client perform some magic that it knows the last watched value?

Regards


(Justin Lowery) #2

Yes, that is correct. That condition is essentially what the note in that section is describing.


(Rowan Goemans) #3

Then what is the use case then for watchers. You can’t really build anything on top of it since you can’t ever say you have the most recent version of a variable which quickly becomes a huge problem for bursty changes where a client could have an old version of a variable for days or weeks.


(Ryan Worl) #4

There is probably a better idea than this for implementing what you’re looking for, but if all you’re after is making sure a watcher can read every change on a specified key, you can create a watch on a “signal” key which is changed with SET_VERSIONSTAMPED_VALUE and sort the individual changes in a range of keys which are ordered using SET_VERSIONSTAMPED_KEY with a value of the change you need your clients to observe. Those two operations are done within a transaction so they are atomic.

This way you don’t have to do polling (which I think is the point of watches in the first place given the example code in Python), but you also have the benefit of reading each change individually if you need it.

You’ll also need to garbage collect the old values when you’re finished.

The key space will look something like this:

TARGET_KEY => versionstamp_0
TARGET_KEY/versionstamp_0 => change_0

TARGET_KEY => versionstamp_1
TARGET_KEY/versionstamp_1 => change_1

You use the watch to know when the key has changed, but you only do a range read on the versionstamped portion to know what the actual change was.


(Rowan Goemans) #5

Like illustrated in the my original post, I don’t need every change I simply can’t deal with the fact that a change is never observed. This is what would be broken for my application.

Client A watches value
Client B sets value
Client A gets notified

  1. Client A reads value
  2. Client B sets value
  3. Client A watches value

I have marked the problematic section with numbers. The value that is set in 2. is never observed. I don’t need every change. Client B can change the value 1000x and that Client A only get’s notified once is fine. What is not fine is that an unobserved change slips through the system. For instance let’s say I have a device which saves bursty data every 10 minutes and another client that watches. This means I might have possibly stale data for 10 minutes.

Watching on a signal key doesn’t solve this problem.


(Christophe Chevalier) #6

You are supposed to read the value and set the watch in the same transaction, so:

  1. Client A reads value and watches value
  2. Client B sets value
  3. Client A is notified.

The issue that stamps & atomic ops help solve is:

  1. Client A reads value (=1) and watches value
  2. Client B sets value to 2
  3. Client C sets value to 1
  4. Client A is notified (and reads either 1 or 2 depending on timing)

That’s why you need another key to detect this ABA scenario if this breaks your invariants.

But in both cases, your trigger should be called.


(A.J. Beamon) #7

Christophe’s suggestion to read and watch the value in the same transaction is the way to accomplish what you’re after. Basically, the contract behind a watch is that it will notify you eventually if the value becomes and stays different than the value at the read version of your transaction. If you’ve set the value in your transaction already, then it will instead notify you if the value becomes different than what you’ve set it to.

One important thing to note here, though, is that you may not actually be notified of a change in an ABA scenario. If the value is changed back to the same value that it was at the time the watch was created and the watch hasn’t yet fired, then it won’t fire until it’s changed to a different value again. See the API documentation (e.g. Python) for a more thorough explanation.


(Christophe Chevalier) #8

I’m surprised about that, because I don’t see anything in storageserver.actor.cpp that would make the watch not fire when a key is mutated? I see that it checks if the value is already changed in watchValue_impl when the watch request arrives, but then the AsyncMap only contains the keys (not the value) and is unconditionally checked when a mutation is applied. Is this behavior coming from before the watch reach the storage actor ?


(A.J. Beamon) #9

I’m not sure what all the cases are that could lead to this behavior, but there’s the fallback polling mechanism, for example, if there are too many watches. watchValue_impl is running a loop that checks the value both at the beginning and every time the key is updated, not returning until the value has changed, and it might be possible for multiple updates to appear between when the key is updated and when the value is checked. Also, data movement will cause relevant watches to be cancelled on the storage server and moved to a different one, during which time multiple updates could happen.


(Christophe Chevalier) #10

Oh right, I remember seeing something about polling. Looks like this is another reason why watches should probably only be used with atomic operations, or values that can only go up.

But we are still assured they would fire if the updates are atomic adds or versionstamped values? (ie: we are not in a situation where ABA is not possible).


(A.J. Beamon) #11

Yes, when the watch is moved due to data movement, an error goes back to the FDB client which automatically issues it to a new storage server with the same value as before. Of course there are other types of errors which would bounce all the way back to your client code, and to not miss an update in that case you would need to have remembered the value when you set the watch, read the now current value to compare it, and then set a new watch.


(Alex Miller) #12

The watch only fires when a key is changed. The onChange future only causes the loop to resume, as there’s no reply.send after the wait(watchFuture) call. I honestly did find this surprising when reading through the code.

The underlying NativeAPI implementation of watches has support for providing an initial value, that would make doing a read-compare-watch transaction unnecessary. However, the ability to provide the initial value appears to be not exposed though bindings? Was there a reason for this?

This is what I got the feeling of after thinking about how the code works. It looks like one should either trigger watches with an increment and always supply the last known value to the watch, or clear the key and set a watch on it transactionally.


(A.J. Beamon) #13

This API is used by ReadYourWrites because it needs to be able to supply a value that could be the result of a mutation in the same transaction. In some cases NativeAPI does use the remembered value to reissue a request, but in others it propogates errors back out to let the client decide what to do. You could potentially avoid doing another read on client retries by letting the client specify the value it remembered, and I can’t recall if that was consciously considered as an option. If so, I don’t know if there were any objections to it, but I can’t think of any now.


(Geo) #14

I feel like we have somehow thrown out the baby with the bathwater here – the baby being “Event-Driven Architecture”

You know, all that Kafka mindshare and Congruent Professional Service contracts – that Event-Driven Architecture. The same “Event Thinking” that Gartner is jumping up and down about. Why do I feel like Jeff Goldbloom in Jurassic Park asking “Hello? There are going to be some Dinosaurs on the Dinosaur tour??”

I want my Event Sourcing model to actually allow me to listen and react to Events – and in case you have not realized what you really have, I’ll tell you – God’s gift the “Event Thinking”.

I don’t wanna hear about jumping through hoops to do events right, so let me delineate the hoops.

The constructs we need are actors, agents, and “the fabric”. Agents handle the listening and dispatching to Actors. Agents declaratively specify the “pattern match” on a particular topic. Agents allow Actors to practice the Hollywood Principle and provide implicit invocation.

Actors are pampered. They are spoon fed an event, take the stage, do their work, and wait for the next gig. If they need data not contained in previous events (or discarded), it can “Query” an assistant to get the lines – “Line please”. The Data Fabric is like a Concierge at the Actor’s Guild – handling the “off topic” information needs (since events are always the minimal expression of “on topic” information).

Agents ingest events and invoke Actors. It’s a simple job of dispatching, so a Declarative DSL is indicated.

Actors ingest events, process something, and emit an event. Said another way, Actors transform Events – like ETL. That’s what streaming is. Some actors transform from the event log to a data model. This is how the data fabric gets populated. It’s all Actor based, and you must have events exposed.

To reap CQRS benefits, commands and queries are separated, which means writing and reading from possibly different data models. This write to the log, then listen to the log and write to the SQL DB historically has led to eventual consistency since ACID was not supported. You’ve blown the doors of Polyglot Persistence with not only ACID across the layers separating commands from queries, but the ability to write to/through a “projection” to achieve cross model consistency, and I’m not sure even Kleppmann has thought about this use of the term. Writing to the metadata storage layer as a common event log would be the fastest, but writing through a layer to affect the same transitive impacts on the layers is just a hop away. Suddenly, Polymorphic Persistence with ACID transaction is blooming.

Other DBs run into issues when writes are implicitly followed by a read. Traditionally, you write, and then at some future time, somebody queries the info. In EDA, there is/are one or more immediate listeners (agents) hooking each event.

It’s called “Event-Driven” for a reason. We have to get to the events in order to process them – which means replication to scale topics and triggers, not polling.

If we call that “watches” in this parlance, that’s cool. A rose… But when I hear "you gotta code it this way because if you code it he other way events can get “inconsistent”, then I have have to say that in the kingdom of Polyglot ACID Persistence, this emperor is naked! Events are used to drive replication and transactional locks for consistency – so somehow you through the baby out with the bath water.

Solve EDA, and you have the “Killer App” for Linearized, Stateful Replication at the most fundamental level.

Hope somebody can “understand the words that are coming out of my mouth.” :slight_smile:

Respectfully,
George Willis
Duke Energy Modern Architecture