we re-use the same transaction
We are doing something like this, but just copying read version. In our tests performance is similar for both variants (reuse rw or tx).
This is very different, and has orders of magnitude more overhead (because 1000x more requests)
One of the selling points is intelligent batching, but it seems that it is not implemented for reads? If every key is on the same SS then workload is about the same as range reads. Overhead is only on network that could be avoided or batched. This is just very surprising that it is not true.