Many failed transactions during replication

When I have a cluster doing nothing but replication, I’ll see cases where it will seem like most of the replication transactions are failing. An example status:

Workload:
  Read rate              - 4568 Hz
  Write rate             - 17 Hz
  Transactions started   - 84 Hz
  Transactions committed - 2 Hz
  Conflict rate          - 0 Hz

Am I reading too much into the transactions started/committed stats? Or is something actually going wrong here?

I believe data distribution only does a transaction when the work queue changes, so approximatily when a shard is queued, started, or finishes, so the committed transaction rate would not be high. Two shards changing state per second is reasonable.

1 Like