Backpressure against data pipelines

rcbranch · March 6, 2020, 3:45am

I dug through the available trace logs from when our database was unhealthy but couldn’t find any RkUpdate events during that period. The only RkUpdate trace events that I could find are from trace logs when I first brought up the cluster. We will try to simulate the condition again and I’ll try to collect any events.
Afaik, we aren’t doing anything special client-side for read version caching.
We have a read-then-write workload but the read-to-write ratio is 4:1.

One thing that I learned here: What do you monitor? is this: When the queues reach a certain threshold (currently 900 MB on storage, 1.6GB on logs), ratekeeper will slowly start to throttle transactions. If the queue continues to grow, the throttling becomes more aggressive, attempting to achieve certain limits (1GB on storage, 2GB on logs). It may be worth noting, though, that ratekeeper will let 1 fault domain’s worth of storage servers fall arbitrarily far behind without throttling, in which case the storage server will eventually stop trying to fetch new data when it reaches what we call the e-brake (1.5GB).

For us when the database was unhealthy, one of our storage processes had peak queue size at 1.08GB but the rest of storage processes were <560MB and all tx log processes had a queue size at ~515MB. Could it be that because of these conditions, while suboptimal but not critical, the RK throttling never kicked in?

Topic		Replies	Views
Best strategy to handle client overload Using FoundationDB bindings , performance	9	1203	August 11, 2020
Storage queue limiting performance when initially loading data Using FoundationDB	10	2727	October 14, 2019
Storage Server CPU bottleneck - Growing data lag Using FoundationDB performance	22	3017	December 13, 2021
FDB cluster freeze Using FoundationDB	12	417	March 22, 2023
How to troubleshoot throughput performance degrade? Using FoundationDB performance	35	4319	June 20, 2019

Backpressure against data pipelines

Related topics