Range Reads greater and 5 sec and 10000000 bytes


If I have to perform a really large range scan - assume a few 10s of million rows from a single client, what would be the suggested mechanism without running into 5-sec transaction limit?

Some more details

  • Scan required for order of 10s of million rows - actual bytes could run into few GB.
  • Key distribution is not precisely known in advance.
  • Even if FDB can return so much data to the single client (concurrent) within 5 seconds, assume that the client itself cannot hold all the returned data in memory and there would apply some processing on the incoming stream. Due to amount of computation required, this processing can overflow 5 seconds.
  • It is acceptable that client does not see any new data after it started the read operation.
  • Very likely there will not be any concurrent writes to the range being read, while this range scan is going on; however, if possible, it would be preferable to not abort those concurrent write transactions if they do occur.

What I want is to create an iterator of the response of a single rangeRead() which will exceed a time limit of 5 sec and byte limit of 1000000 bytes.
A few challenges to the same are

  • How do I break a single rangeRead into multi rangeReads [ batching]
  • How do I manage my transaction? should I close/commit the transaction at the end of the batch and start new on the start of batch
  • Is snapshot reads only possible strategy? Does foundationdb support other kinds of read-only transactions?

Any pseudocode will be highly appreciated.

This thread may look like a copy of Large Range Scans - avoid 5s limit but 292 thread is an year old thread and looks like dead. Also the challenges that i mentioned are not discussed in 292.

I have posted a rough outline of the code structure I use for similar goals - see if that is useful to you?

I think this is largely addressed on the other thread now, but I’ll add that there’s no need to commit a read-only transaction as it won’t have any effect.

A regular transaction will be a read-only transaction so long as you don’t write to it (or, if you do write to it, I suppose you could still think of it as read-only if you don’t commit it).

You are actually prone to experience the opposite behavior if your read spans multiple transactions unless your data model provides some mechanism to avoid it. In other words, each time you have to start a new transaction, you will be reading at a newer version and you may see data that was written after you first started your scan. It’s not possible currently to have a long-running read operation that would see a consistent view of the database throughout the operation, though future storage engines may support this for read-only transactions.

Read-only transactions will have no impact on concurrent writes. So long as you are ok with the concurrent writes affecting the consistency of your read as described above, you wouldn’t have to worry about this.