I have a question about the FDB transaction log disk being busy in one of our production deployments.
Our deployment is 3-DC, two-region setup. The FDB version is 6.2.27. We found that with 15 transaction logs on each DC, the disk busy is high:
We added 15 more transaction logs hoping to see the disk busy on transaction logs reduced but it did not help. Each of the old and new transaction logs still exhibits the similar disk busy as before.
I don’t see any increase in traffic that can explain the behavior.
I noticed in the FDB paper at Section 2.4.3 it says:
“After a Proxy decides to commit a transaction, the log message is broadcast to all LogServers. As illustrated in Figure 2, the Proxy first consults its in-memory shard map to determine the StorageServers responsible for the modified key range. Then the Proxy attaches StorageServer tags 1, 4, and 6 to the mutation, where each tag has a preferred LogServer for storage. In this example, tags 1 and 6 have the same preferred LogServer. Note the mutation is only sent to the preferred LogServers (1 and 4) and an additional LogServer 3 to meet the replication requirements. All other LogServers receive an empty message body. The log message header includes both LSN and the previous LSN obtained from the Sequencer, as well as the known committed version (KCV) of this Proxy.”
To my understanding, it seems that besides the 3 LogServers that persist mutation logs, other LogServers also receive an empty message body. I think it might be the case that these LogServers also need to persist the message header’s details to disk. Is my understanding correct? If yes, would it explain the behavior mentioned above because in our system, each write transaction is small that stores a few small key-value pairs in a transaction to FDB.
Also, if that is the case, does that mean we cannot scale write throughput more if a transaction log server reaches the disk bottleneck?