Ok, it looks like the network was to blame for the max row read latency. I’d naively assumed that the lost packets would be retried at around 2*RTT, but it looks like Linux waits at least 200ms before retrying by default. Setting rto_min on the interface to a smaller value seems to improve the worst case latency, at the cost of some unnecessary retransmits.
Regarding Latency98 in the trace file, it seems to only measure transactions that modify the database, so rather than the p98 of a single row read, I’m guessing the 20ms was the p98 my test code was taking to complete a write transaction? It was doing approximately 2 gets and 3-4 (double) sets, so I’m guessing the per op latency would work out to be about the 2ms you mentioned.
Edit: I’ve spun this out into a new post: Proxies and packet loss/retransmits