Are spikes of 500ms+ MaxRowReadLatency normal?

eastern-grey · July 7, 2019, 10:19am

Ok, it looks like the network was to blame for the max row read latency. I’d naively assumed that the lost packets would be retried at around 2*RTT, but it looks like Linux waits at least 200ms before retrying by default. Setting rto_min on the interface to a smaller value seems to improve the worst case latency, at the cost of some unnecessary retransmits.

Regarding Latency98 in the trace file, it seems to only measure transactions that modify the database, so rather than the p98 of a single row read, I’m guessing the 20ms was the p98 my test code was taking to complete a write transaction? It was doing approximately 2 gets and 3-4 (double) sets, so I’m guessing the per op latency would work out to be about the 2ms you mentioned.

Edit: I’ve spun this out into a new post: Proxies and packet loss/retransmits

Topic		Replies	Views
Storage queue limiting performance when initially loading data Using FoundationDB	10	2680	October 14, 2019
High P99.9 Latencies (±70millis) on range reads (<1KiB) with ±1000 reads per second Using FoundationDB performance	0	18	February 15, 2025
FoundationDB cluster performance issue - Periods of high disk I/O and sustained high latency Using FoundationDB performance	21	2490	July 6, 2020
High client tail latency with Go bindings, increasing with CPU usage Using FoundationDB bindings , performance	2	609	April 6, 2022
Rust FDB Record Layer Work-in-progress Repository FoundationDB Layers	1	1424	January 26, 2023

Are spikes of 500ms+ MaxRowReadLatency normal?

Related topics