Hi, it’s really exciting that Apple has published a paper about FDB in Sigmod’21. I learned a lot from FDB’s design principles such as decoupling write/read path and decoupling logging from recovery.
During reading, I have some questions about this paper:
- In order to avoid the gap in serial history defined by the commit version, the proxy must send LSN and previous LSN to the Resolver. Does it mean that the proxy must broadcast to all resolvers even if that some resolver is not responsible for the key range accessed by committing transaction (just like the broadcast to all log servers)?
- Although the previous LSN can tell the log server there might be a gap in the received logs, it is possible that the network between the corresponding proxy and log server fails, and the proxy cannot send logs to the log server. How does the log server handle this situation? Will it gossip to other log servers for the missing logs or just wait for long enough time (etc., 5s) and then trigger a recovery?
- In the evaluation of figure 8(a), how is the number of proxies and log servers chosen (2 to 22)? What’s the number of resolvers? Is there any contention? Why does the read throughput not scale to roughly 6x?
- In the evaluation of figure 8(b), why does the throughput of operations not scale to roughly 6x? Do resolvers and proxies reach the bottleneck simultaneously?