Expected limit of client network thread

Looking through https://github.com/apple/foundationdb/issues/3336, Markus mentioned that 2k TPS was surprisingly low for a single client. I’m seeing the network thread max out at about ~1.2k TPS, so it would be great to know if there is something I could do to increase that throughput. Each transaction is a single write batch with 400 ~150 byte keys, so we’re writing ~75 MB/s through that thread. I do see the network thread CPU maxed out, but I have not gotten a profile of it just yet. Since we never do reads, I turned off read your writes on all the transactions, and that was a big win for throughput. Any suggestions would be more than welcome.

It’s seems plausible that you could be maxing out the network thread at 75MB/s and 480k writes/s. I would be interested to hear what results you get from profiling when you’ve done it.

Depending on what you are doing in your transaction, it may be possible to save some cycles by bypassing read your writes. This is done by using the read_your_writes_disable transaction option, and I don’t think it would cause any problems if you aren’t reading anything. If you are writing the same key multiple times in a transaction, then I think this may result in missing out on an optimization to collapse the duplicate entries.

Another potential optimization would be to set a conflict range only once rather than for each write. This will likely increase the size of your conflict surface, but depending on the circumstances the effect may be minimal. You can do this by explicitly creating a conflict range to span all of your mutations and then disabling conflicts on all of your individual writes with the next_write_no_write_conflict_range option.

This option is tricky to use if you are doing writes on the same transaction from multiple threads, so I would probably avoid it in that case. You should be very careful that your big conflict range covers all of your writes, because if you miss one it could result in isolation failures. Also, I’ve never tried this optimization to improve client network thread performance, so I don’t know how much benefit it would have or if setting an option for each write would negate the gains.

Thanks! read_your_writes_disable increased my throughput a bunch. (At some point I edited my initial post to mention I tried that). The next_write_no_write_conflict_range is an interesting idea. My use case is somewhat unique in that we never read and write in the same transaction, so I actually set next_write_no_write_conflict_range and YOLO without any conflict ranges. That’s probably not generally advisable, but it is safe for our use case and is the only way to get the write throughput we need. Otherwise, we end up bottlenecked on the resolver.

I have a profile, but I’m not sure how actionable it is. The only thing that jumps out to me is that > 50% of time seems to be spent in N2::Net2::run, N2::Net2::processThread and SAV<Void>::send<Void> directly.