I’ve been doing some performance tests as part of an evaluation for a customer project… The intended use case would be a large set (say 1000) of highly multi-threaded compute processes (ideally one per machine) connecting to a FoundationDB cluster, with a target throughput of 500k tx/sec.
Just running some test scripts on some commodity hardware I have lying around, it seems like the single network thread per process is a significant bottleneck. I have a single physical server running 4 FDB processes, and another test client machine, connected with 1Gbit/sec network. The clients are C++ using the C API directly. The server is running in ‘memory’ model and it gets disk bound if I wind up the value sizes, but the network thread seems to bottleneck well before that:
1 process with 5000 threads doing a get/set cycle on thread-specific keys: around 1,000 tx/sec
100 processes with 50 threads (ditto): around 10,000 tx/sec
This factor of ten difference is consistent across different load patterns.
For the intended application having to start, manage and distribute load over lots of processes per machine really isn’t an option.
I see that Run one network thread per database · Issue #3336 · apple/foundationdb · GitHub was resolved with one thread per cluster, but I’m not sure that was what the original reporter was after (the word ‘database’ was ambiguous I think) - it seemed like they had a similar problem to me. It was suggested in the comments that one thread per FDBDatabase object would solve this, and it certainly would for me too (I could create a pool of them like I would for any other database connection), but the PR that closed it (Multithreaded client by sears · Pull Request #4269 · apple/foundationdb · GitHub) is one thread per cluster, and not in user control.
Are multiple threads to the same cluster possible? Or is there some other mechanism that could be used?
It was suggested in the comments that one thread per FDBDatabase object would solve this, and it certainly would for me too (I could create a pool of them like I would for any other database connection)
The multi-threaded client feature can be made to work in the way you describe, though I should caution that I’m not aware of anybody having used it this way extensively. It will also increase the number of independent connections to your cluster, and this number of connections is also limited (I don’t know the current numbers for 6.3, possibly something like 5K client network threads can connect to a cluster).
In order to use it this way, you will need to set the client_threads_per_version option to the number of separate network threads you want to run, and from there open up a connection to the same cluster multiple times using something like fdb.open. The DB objects are assigned to threads in a round-robin fashion, so each of your objects will have a different thread.
Some language bindings attempt to cache the DB object when you open it and return it again if you try to open it a second time. If you are using one of these language bindings, then I think you can get around this by copying your cluster file multiple times and using a different file for each connection. Python is an example of a binding that uses this caching, whereas I believe Java does not. If you are using another binding, I can potentially check that for you.
Oh, that’s great news, thank you. I had mis-understood that change, then, or perhaps there is another one.
Understood about server-side limits - but that is true whether we do it with processes or threads. It probably won’t require extreme numbers of client_threads_per_version, just more than one!
I’m currently working direct to the C API so I’m in complete control. The customer will probably want to use an (unofficial) C# binding, though, so I will check whether that caches DB objects or not.