I’m trying to better understand the FDB performance model relative to something like HBase / BigTable. With an HBase table, a lot of attention has to be paid to generating even / relevant region splits at table creation + hashing the row keys for good distribution + potentially salting heavily hit rowkey ranges, all in order to avoid hotspotting individual region servers. Do those same types of locality considerations carry over to FDB?
Some questions that come to mind:
if there were a single key that’s inundated with write requests, what kind of trouble could that cause overall? Does it degrade performance of the replicas that own that key, or could it push them over entirely? What does it mean for the rest of the cluster?
same question as ^ for but a read request deluge – https://apple.github.io/foundationdb/known-limitations.html#limited-read-load-balancing suggests some small degree of load balancing, but how do the nodes degrade if the replicas aren’t sufficient?
if an application generates sequential keys but never needs range reads, is it worthwhile to hash the keys preemptively for even distribution? e.g. rather than storing
(subspace, sequential_key), instead store
(subspace, hash(sequential_key))when the application only needs multi-gets.
when FDB is shuffling data to rebalance, how does it choose which keys to shuffle? Does it do it on a per-key/subspace/some-other-boundary basis? https://apple.github.io/foundationdb/features.html#load-balancing suggests some level of continuous movement, but it’s not clear how those decisions are made.
suppose there’s a write-intensive application that’s writes heavily to particular subspaces at a time. If / when does it become worthwhile to split those writes across multiple subspaces?
I’m sure I’ll have more questions as I continue to dig in… very excited to see FDB alive and well again