Transaction size limit calculation

According to the documentation:

Transaction size cannot exceed 10,000,000 bytes of affected data. Keys, values, and ranges that you read or write are all included as affected data.

Through reading the code and testing it myself, the size of the values you read does not seem to be a part of the calculation. I wrote 15mb of data into a range in multiple transactions, read it back in a single transaction with one large range read, and did not encounter transaction_too_large as I would’ve expected.

From reading the code, it seems like the only way to hit this on read would be to read so many individual keys or ranges as to create more than 10 megabytes of keys, not values.

Additionally, from my understanding of the internals, value size should not really be an issue in the same way as writes and conflict ranges are. Large writes are hard for obvious reasons, and many individual conflict ranges would make conflict detection take longer. But reading data comes from the storage servers, which don’t do much more than just return the data.

Is my test wrong somehow, or could the documentation be more specific about how the calculation is done?

The use case I was interested in is, if the new storage engine enables longer running transactions through more sophisticated multi-versioning, doing snapshot reads over lots of data would be possible. Making this part of the documentation more clear would make that use case more obvious to other users.

Edit: Additionally, can you even encounter this at all if you’re doing a read-only transaction because they don’t commit?

1 Like

I believe transaction size only limits mutations, reads don’t hit that limit (would fail or course if you ask for a past version). Pure read transactions can’t conflict.

Right, so that sentence in the documentation isn’t accurate with respect to value sizes being a part of the size limit.

Unless anyone can clarify why that is in the docs, I’ll open a PR to change that language sometime soon.

The documentation isn’t very clear, but it’s trying to say that keys and values that you write count toward your limit. Also, all of your read and write conflict ranges count against the limit too. A conflict range consists of two keys that mark the beginning and end of a range that is read or written.

And I guess to be more precise, it’s really the size of your mutations that counts, not the size of the keys and values. For sets these are the same, but for clears and clear ranges the mutation consists only of keys. For atomic ops, the size of the value you pass in matters, not the size of the resulting value in the database.

Also, I believe for writes, you need to triple the size of the key (not the value) since it’ll be added as a write conflict range twice (technically 3x + 1 byte). Might be helpful to expose expectedSize() to clients which does the actual summing of mutations + readConflictRanges + writeConflictRanges.

1 Like