Impact of workload concurrency on write amplification


We just ran YCSB workloads over the latest FoundationDB (with redwood storage engine) on SSDs with built-in hardware-based transparent compression, and measured how transparent compression could reduce the physical dataset footprint and write amplification. One thing we observed is that the workload concurrency (i.e., the number of YCSB clients) has a significant impact on the write amplification, e.g., under 100%-update YCSB with 64B record size and 4KB page size, write amplification (i.e., the total_write_IO_traffic_volume / total_size_of_updated_records) increases by 3x when we reduce the client number from 32 to 4. Could anyone please shed light on what the reason could be? The reason why we are interested in write amplification (in addition to storage capacity) is that the emerging QLC flash has very limited cycling endurance, hence it is critical to reduce the write amplification. Thanks.

Tong Zhang, ScaleFlux

When you reduce the client count, doesn’t that also decrease throughput? Unless it does not, then what you are seeing is completely expected for a BTree and a small random update workload.

The more writes there are, the more likelihood there is that in a given commit more than one of those writes will fall on the same page. When that happens, there is less write amplification because you are able to update more KV bytes per 4k page written.

Thanks for your response. Yes, the overall throughput drops as the client count reduces. The dataset is about 200GB, and the updates distribute uniformly over the entire dataset. So for either 32 or 4 clients, there should be a small probability that multiple updates fall into the same 4KB page in a given commit. Similar write amplification difference was also observed when we increase the record size from 64B to 256B. Could the write amplification difference partly come from WAL or pager (e.g., less client count may cause larger write amplification on WAL or pager writes since each WAL/pager write-to-SSD must be 4KB)?