Reasons for not co-locating tlog and SS? IO characteristics of SS

For #1, two other considerations are

  • The logs call fsync() for every commit version, so hundreds of times per second, while storage servers only call it once or twice per second. I think most drives incur some small hiccup in performance while an fsync is pending.
  • An SSD’s write performance per pattern (linear vs random) under a mixed linear+random workload is usually not the same as what each workload can achieve individually. I’m not entirely sure why this is, but it’s a thing. In other words, if a drive can do 300MB/s linear and 50MB/s random writes, if you do 25MB/s of random writes you do not still have 150MB/s of linear write budget remaining, it is something less.

For #2, I certainly agree that storage server I/O characteristics should be better explained and in one place. Probably the single most detailed source of this information right now is my presentation and side deck from the 2019 summit. The slides can be found here, the video is not yet linked but should be soon. FoundationDB Summit 2019: Redwood Storage Engine Update

Regarding the write queue depth: FDB uses SQLite on top of a file caching layer that holds all writes in memory until commit time and then issues them to disk all at once. This is to coalesce multiple writes of the same pages during the commit cycle. So yes, the write queue depth is large when writes are being done but for much of the time writes are not being done, and the bottleneck is for the single threaded writer to read the uncached pages it requires as it traverses the tree for each mutation and applies its changes.

1 Like