Hi, We need to support large number of connected client in our workload. We want to test the max number of connected client FDB can support and how FDB react when it exceeds its limit. We use the metric “cluster.clients.count” as states in foundationdb/monitored-metrics.rst at main · apple/foundationdb · GitHub to compare with the request client count. From the alert suggestion, it seems the fdb should be able to support more than 1000 connected client, however, that’s not what we observed in our test.
Our cluster setup:
- FDB 7.1.15
- Multi-region deployment
- 9 coordinators, 3 per datacenter
- 21 VM for primarydatacenter , 3 for satellite datacenter and 21 VM for remote datacenter
- In primary datacenter, 3 vm for logs 3 vm for statless, and 15 vm allocated for storage
From the experiments, the max number of connected client to the cluster that without seeing any errors is 640, where we have 320 clients do read-modify-update to primary and 320 clients do read-only to remote datacenter.
Then we started seeing FDB process failure when we increase the clients to the cluster.
- Connected client count at 720:
Trigger FDB transaction system recovery
1. The cluster has some unreachable processes. (stateless role being killed by "Fatal Error: Network connection failed")
Unable to retrieve all status information.
2. Performing recovery transaction.
3. (Re)initializing automatic data distribution
4. Healthy
- Connected client count > 900:
FDB cluster become unavailable and unable to recovery from the transaction system recovery.
If we reduce the connected client process but increase the thread per process, then the cluster works fine. Ex. The cluster can support 500 client X 4 thread per client but not 1000 process X 1 thread per client.
Is this expected behavior of FDB when receiving too many open client connections?
Is there any documentation that we can refer to to understand the internal logic of how FDB handle the too many connected client use case?
I would love to understand this better and any help will be appreciated. Thanks!