We’ve been facing a lot of Transaction Cancelled errors (Operation aborted because transaction was cancelled: 1025) lately. We have identical, unique transactions of ~500 KB each being transacted by 6 Java clients in a loop. We do not have any timeout options set for these transactions. We were running these with a performance testing setup in mind, and the write rate observed was ~25000 kHz.
Our cluster configuration:
- 5 machines
- triple replication
- 1 SSD per machine
The process class configuration:
Machine 1: 2 storage + 1 proxy + 1 stateless
Machine 2: 2 storage + 1 proxy + 1 stateless
Machine 3: 2 storage + 1 proxy + 1 stateless + 1 log
Machine 4: 2 storage + 1 log + 1 stateless
Machine 5: 2 storage + 1 log + 1 stateless
Interestingly enough, our logs suggest that some of these transactions were already committed when the said error was encountered.
Any insight into this would be helpful!