How to achieve (1000 tx 1000 reads each)/sec on single client

That’s a really important difference, and another really important difference is that the range read mentioned in your example is 1 request out with a very large response returned. You are trying to do 1000 individual requests out with 1000 individual responses returned. This is very different, and has orders of magnitude more overhead (because 1000x more requests) in the client for cross-thread synchronization, network serialization and send, and receive overhead to match up the responses to requests (they are not 1:1 in order or even necessarily from the same TCP connection if replication is greater than single).