Single client performance


We are using nodejs and 12-machine FDB cluster (1-3 processes each). nodejs have ~10 instances. Some of them are constantly pulling data in infinite loop.

Everything works fine until in nodejs side we accidentally started a lot of transactions (say ~1-4k each) and our backend started to suffer. We upgraded our FDB setup from 5 to 12 nodes - nothing changed. Looked at FDB processes - everything is fine (less than 10% CPU, low IOPS). Our nodejs processes are also fine, but operations became unstable with large delays (seconds).

Last possible bottleneck is a FDB client. We are using one nodejs process + one FDB client instance and we started to investigate if client side became saturated.

I have found multiple explanations on this forum about how FDB client works:

  • One place mentioned that client can have one transaction per connection at the same time. Does this means that one client == one connection == one in flight transaction? That doesn’t seem right.
  • What is expected single client transaction performance? How much each client can handle at the same time?
  • Does FDB Client do some special scheduling to avoid “DDOSing” by a single transaction. Does client distributes workload evenly between all of current transactions? Does transactions are more like actors with their own mailboxes or R/W operations are all in the same queue?
  • Is it possible (advisable) to use multiple clients in single application instance? (IIRC golang one doesn’t support this)


What I noticed with Java Client, but this should be relevant to all since they all yse native bindings:
FDB client use single thread for all networking no matter how many FDB class instances you create (it is hard coded in client C code) and this happens to be a bottleneck for me at around ~3K transactions per second. So I have to create several client process instances on the same server to overcome this 3K limit.

I had the same issue running the Java client through an Apache Flink job. To work around the issue I had YARN assign multiple JVMs (TaskManagers) per-node instead of one large one. Slightly annoying since it causes more network IO to occur during the shuffle stage, but at least our peak write throughput is more stable.

It would really help if there was a way to indicate the C library to utilize more than one thread for the network stack.