Tracking GRV latency to avoid getting a new read version from proxies when they are responding slowly is clever, but in some cases it could end up adding to load on the same storage server(s) which are causing ratekeeper to intentionally make GRV latencies high in the first place.
@ajbeamon and I were discussing this earlier at our mobile air fortress office (aka a commercial flight) and were thinking maybe there should be some support in our API for getting a cached read version (CAUSAL_READ_YOLO?) that could encapsulate some logic about whether it’s okay to proceed.
The most obvious thing to include is something about what gatekeeper is doing, if we can plumb that information efficiently. For example, if the GRV response from proxies included whether or not the response latency was increased intentionally due to ratekeeper then the client could know when a version is not safe to reuse as a cached version because it circumvent’s ratekeeper’s attempt to limit transaction starts.
Going further, knowing that ratekeeper was limiting due to logs or storage servers could also be useful. If the client knew that a transaction was read-only and the most recent RV it has was affected by ratekeeper due to logs then perhaps it’s safe to use that RV as a cached read version. And something more complicated could probably be done where reads are okay so long as you don’t talk to the storage server(s) that are triggering ratekeeper (but you could still potentially talk to storage caches for the same data).
Hi Steave, all the suggestions you and AJ have made would be quite useful for caching the read version effectively.
Just a small correction:
…when grv latency increases, clients throw away the cached version and force transactions to go to the proxy - giving back to proxy the control to start new transactions. We only bypass the proxy as long as latencies are below a threshold (cluster in good health).