High client tail latency with Go bindings, increasing with CPU usage

Hi,

We have run into a performance barrier while trying to minimize the latency of read transactions in our applications. Our service currently handles around 20,000 read-only requests/s, being split into around 40 instances (processes). Each request is small, a few 100 bytes each. The service uses the Go bindings of fdb.

We note that the usual p95 time of each such transaction takes around 9ms. That is the time it takes the fdb Go library to return from a call. This kind of latency is acceptable to us. We also note, though, that the times of p99 go up significantly into ~16ms, and p995, p999 are consistently way up in the order of ~40ms and ~70ms, respectively. This extremely high tail latency is what we are trying to improve.

The requests are very standard read transactions, written mirroring the usage in the Go bindings documentation. GetRange transactions consisting of ~100 keys are consistently just ~1ms slower (on all percentiles) than simple Get reads, so raw throughput per-transaction is not a problem.

Some info:

  • used fdb client/server version: 6.2.30
  • API: 600
  • go version: 1.17, same results on other versions

Since we are using fdb 6.2.30, the db could be updated. Having read the changelog, I have not noticed much promising new fixes or features, though.

We don’t recognize any significant slowdowns, or other worrying metrics, on the database side and thus have ruled the database server side out. We believe that the reason for the high latencies comes from the client-side network I/O or thread utilization, which may be related to the way Go handles the cgo of the underlying fdb client library.

I have already tinkered with adjusting the number of processes running the service to optimize throughput of the fdb network thread. I have noticed significant improvements to the p95 times with doubling the number of instances from 20 to 40, but more instances do not yield better results.

We have observed a high correlation between increased latency and higher CPU usage on the machine. While normally it would not be a surprise that a machine struggling with CPU power would produce bad latencies, the machines employed here are never pushed above ~90% CPU. We use the standard Docker/Kubernetes way of separating the CPU usage of applications, so the service is generally receiving enough CPU time to operate. Nonetheless, the latency of the service instances running on machines with 60% CPU utilization is much smaller than on machines running at 85%.

Using the Golang’s pprof tool we note that the cgocall of the Go fdb client binding is taking almost half of the CPU time of the app. I’m unsure, though, how that should be interpreted. If the cgocall blocks, waiting on I/O, it should not, to my understanding, register CPU time in pprof. 95% of that is started from the fdb.StartNetwork func.

I understand that all of this is, obviously, not enough information to definitively pinpoint the problem. Nonetheless, I would like to ask if there is anyone here that has experienced similar problems or would know of a good approach for debugging the latency issue. Any help would be greatly appreciated :slight_smile:

Have you rules out GC?

Hi, thanks for the reply. I have not ruled out GC. It seems quite unlikely to me that this is the source of the problem, but I’ll investigate deeper in this direction.
Thanks for the input!