How to scale foundation db reads

I’m assuming that what you’re doing here is intentionally trying to determine what’s the maximum number of linearizable point reads that you can do. Unlike many other databases, the first point read in a transaction is significantly more expensive than the second due to the implicit GRV call, and when there isn’t a second call, it’s just a benchmark of GRVs as you’ve discovered. In that case, you would need to try to get as much GRV throughput as possible, but admittedly, FDB isn’t great at this. If this workload is an accurate model of your production workload, then this is normally when I recommend anything that does partitioned quorums, e.g. Cassandra, as a better fit.

If what you’re trying to do is benchmark “my transactions do a lot of point reads, how does the throughput on that look?”, then just doing many point reads per transaction (as AJ recommends), or re-using read versions (as Steve recommends) are great directions to go.

EDIT: It’s possibly worth noting that on read-only transactions, fdb_transaction_commit is a no-op.

Upper bound of clients? Or upper bound of clients that are sitting in a tight loop issuing GetReadVersions requests?

I think the overall upper bound on clients is somewhere around 10k ±5k right now, at which point regardless of your configuration, one of {coordinators, cluster controller} will fall over due to trying to heartbeat so many connections. (There is actually a PR open right now that should improve this.)

For the latter case, I don’t think we have a performance test that does that, which you might have discovered.

If I wanted to benchmarket GRV throughput, I’d…

  • Find a machine with a fat NIC and a lot of cores
  • Run all my proxies on one machine to remove the network latency between them.
  • Set CAUSAL_READ_RISKY on all my transactions.

Each proxy needs to talk to all other proxies to calculate the read version (and a few TLogs if you don’t let us break causal consistency under faults via CAUSAL_READ_RISKY), so you’d want to minimize their latency. This ‘everyone talks to everyone’ behavior does mean GRV throughput doesn’t scale up linearly with proxies. You’re very unlikely to max out the bandwidth on one machine with just GRV requests, and you’d have to start from 1 and increment until you find your optimal number to handle as many requests as possible but not cause too much extra communication.

This would not be an advisable setup to run in production though. It’s the exact opposite of how you’d try to set up a cluster for a write bandwidth test. Every commit you do would need to flow into and out of that one machine. Commits, having actual data, will stress your network bandwidth much sooner. We generally don’t recommend using CAUSAL_READ_RISKY by default, and not using it would require sending all requests across the network to all your transaction logs, likely defeating some of the benefit of gathering all your proxies on one machine.

I can’t think of any. Increasing that sounds like a solid improvement to your benchmark.

Adding knob modifications to the benchmarketing above, just crank up all the batching done on both the client and the proxies.