If I have to perform a really large range scan - assume a few 10s of million rows from a single client, what would be the suggested mechanism without running into 5-sec transaction limit?
Some more details
- Scan required for order of 10s of million rows - actual bytes could run into few GB.
- Key distribution is not precisely known in advance.
- Even if FDB can return so much data to the single client (concurrent) within 5 seconds, assume that the client itself cannot hold all the returned data in memory and there would apply some processing on the incoming stream. Due to amount of computation required, this processing can overflow 5 seconds.
- It is acceptable that client does not see any new data after it started the read operation.
- Very likely there will not be any concurrent writes to the range being read, while this range scan is going on; however, if possible, it would be preferable to not abort those concurrent write transactions if they do occur.
What I want is to create an iterator of the response of a single rangeRead() which will exceed a time limit of 5 sec and byte limit of 1000000 bytes.
A few challenges to the same are
- How do I break a single rangeRead into multi rangeReads [ batching]
- How do I manage my transaction? should I close/commit the transaction at the end of the batch and start new on the start of batch
- Is snapshot reads only possible strategy? Does foundationdb support other kinds of read-only transactions?
Any pseudocode will be highly appreciated.