High GRV proxy latency

Recently, we have observed high GRV proxy latency in one of our clusters. Average GRV proxy latency is about 300us, and maximum latency reaching 50ms. After checking the logs of the GRV proxy servers, we found the following types of warning messages:

  • LargePacketSent/LargePacketReceived approximately once per second
  • GrvProxyRateLeaseExpired once every 5 seconds

Our FoundationDB version is 7.1.31. The code stack corresponding to LargePacketSent is as follows:

Does this mean that there is an issue with our network, and how could we further troubleshoot this problem? We have encountered high GRV proxy latency before, and at that time, we chose to restart all servers, after which the latency returned to normal. However, after running for a while, the latency of the GRV proxy gradually increased again.

Thanks!

Noticed that there is a bug fix related to GRV proxy latency in version 7.1.34 (10627). Maybe this can solve our problem. :face_with_monocle:

Yes. That almost certainly will fix your problem. The large packet you saw is probably the response from Ratekeeper, which contains too many staled storage servers. As a result, GRV proxy spent more time processing the packet, causing high GRV latency.

1 Like