I need a way to detect application nodes that are alive or dead, in order to schedule and abort work items assigned to each of them.
My current idea is to have each server periodically update a key that contains the current time + timeout delay, like (...., "watchdog", TIMESTAMP, SERVER_ID) = ''
.
When someones sees a key with a timestamp that is in the past with a simple get_range(...0, ...now)
, then the corresponding node is declared dead, and all pending tasks for this node will be reassigned/failed.
I could use the local clock of each servers but I’ve been bitten in the past by server clock that drift by up to dozens of minutes in any direction (Intel NUC with NNTP server blocked in the firewall due to improper configuration). So a server could be in the future and will take too long to be diagnosed as dead. Or it could be in the past and will instantly be declared dead. Or the server that checks for liveness has its clock in the future and will go around killing everyone else instantly in a loop.
I would like to have a shared source of time, in order to have a good enough approximation of the watchdog timer delay. I don’t need milliseconds precision, only a way to detect a server that did not update its key after 5 or 10 seconds.
I though of maybe using VersionStamps as a source of time, but their growth rate may not be constant, and also can be changed in a config file. Also, I can only set a key with the current VersionStamp, not a versionstamp “in 10 seconds”.
Looking at the system keyspace to find inspiration, I stumbled upon the timekeeper
that seems to be keeping a map of read version to system clock time, updated every second. Looking at the code, it looks like it is the current Cluster Controller that uses its clock to maintain a map of readVersion to sysTime, and keeps about 6 month’s worth of keys.
This would look like a good fit but I have a few questions:
- The timekeeper can be disabled using the
\xff\x02/timeKeeper/disable
key. What scenario requires it to be disabled? Can I assume that it is running in normal conditions? - Is this some internal implementation detail that could change at any time? or can this be used by layers?
- Could this be used to approximate the value of a transaction read version in 10 seconds, using the last N samples ?
Again, don’t need a lot of precision only “it is very likely that this key has not been changed in the last 10 seconds of real time”.