Understanding slow log servers

I have a write-heavy workload where at steady state regular writes are sometimes throttled and batch transactions are somewhat consistently throttled. Looking at the output of status json, it says that the factor limiting batch transactions is a long log queue. From looking at the ProcessMetrics, MachineMetrics and our system metrics that the problem is not a network or CPU bottleneck. My working hypothesis is that the process is bottlenecked on the disk, but I’m wondering how best to measure that. For storage servers, the server can maintain a deep enough disk queue depth that % of AWS published IOPs is a meaningful measure. The tlogs seem to intentionally not build up a queue depth, so I’m not quite sure what metric to be looking at.