I’ve been doing some performance tests as part of an evaluation for a customer project… The intended use case would be a large set (say 1000) of highly multi-threaded compute processes (ideally one per machine) connecting to a FoundationDB cluster, with a target throughput of 500k tx/sec.
Just running some test scripts on some commodity hardware I have lying around, it seems like the single network thread per process is a significant bottleneck. I have a single physical server running 4 FDB processes, and another test client machine, connected with 1Gbit/sec network. The clients are C++ using the C API directly. The server is running in ‘memory’ model and it gets disk bound if I wind up the value sizes, but the network thread seems to bottleneck well before that:
- 1 process with 5000 threads doing a get/set cycle on thread-specific keys: around 1,000 tx/sec
- 100 processes with 50 threads (ditto): around 10,000 tx/sec
This factor of ten difference is consistent across different load patterns.
For the intended application having to start, manage and distribute load over lots of processes per machine really isn’t an option.
I see that Run one network thread per database · Issue #3336 · apple/foundationdb · GitHub was resolved with one thread per cluster, but I’m not sure that was what the original reporter was after (the word ‘database’ was ambiguous I think) - it seemed like they had a similar problem to me. It was suggested in the comments that one thread per FDBDatabase object would solve this, and it certainly would for me too (I could create a pool of them like I would for any other database connection), but the PR that closed it (Multithreaded client by sears · Pull Request #4269 · apple/foundationdb · GitHub) is one thread per cluster, and not in user control.
Are multiple threads to the same cluster possible? Or is there some other mechanism that could be used?