Large Range Scans - avoid 5s limit

forEach ensures a serial processing of keys in a strongly monotonic order.

PS: read AJ’s comment as well. In order for this snippet to work you will have to disable backoff for transaction_too_old error on transaction retry. Otherwise you will see unnecessary delays between transaction continuations.

(edited to remove confusing statements)

@osamarin I checked the code and there is no forEachAsync method. There is only forEach which is the async method (i.e. returns future). I got mixed up in my thoughts. There are multiple versions of forEach - one that takes an executor as a param, vs one that uses a default executor.

Apologies for confusion.

Thanks. I read Java: concurrency questions where they suggested to use forEachAsync. It was a missprint.

You might be interested in this issue: https://github.com/apple/foundationdb/issues/3391. Basically this API allows you to split your target large range into similar sized chunks and then you can scan them in parallel on a bunch of worker processes.

I have successfully used this technique for a large scan, thanks for the pointer.

I was wondering what the difference was between this and an iteration across several chunks with a limit on the range (where you keep going as long as the number of returned elements matches the limit).

Do you have any idea of when this API will become available for general use?

I am unable to understand what is the second approach that you described. Could you elaborate on it, maybe with pseudo code?

The API is already merged into master branch so I think it is targeting 7.0 release. As in when 7.0 will be released, I believe it’s a question for @mengxu @Evan @markus.pilman @jzhou

1 Like

We are planning on increasing the frequency of releases (6.2 is pretty old and 6.3 is now about finished). The 7.0 release branch will be cut 1st of May. We will then still need some time to make everything stable but hopefully we can do this quickly (we make major investments into additional testing infrastructure and automation). So an optimistic release date would be beginning of June.

4 Likes

Hi Markus, any news about the 7.0 release? Is there a place where I can follow the work left?

Sadly there’s currently not a lot of transparency when it comes to this stabilization process. As I expected June 1st was way too optimistic, but we’re getting closer to a first release.

We probably will tag a release soon (in the following weeks). A tag just means that our testing (performance and correctness) looks good – and this is usually the point in time where we start pushing the release to our QA and pre-production clusters.

But we (or rather Apple) typically publish a release only after we upgraded some production clusters either at Snowflake or at Apple and I can’t publish timelines for that.

If you have QA deployments where you want to test FDB you can start upgrading those hopefully very soon.

No worries, I was just wondering if I could help on some (beginners) issues.

Looking forward to play with it :rocket:

I think there’s a small bug, seen calls get twice when using nextKey, but the value should be read only once, otherwise it can yield two different values.

It is just an atomic reference that will return same value unless updated to something different.

Yes you are correct, in this case the value doesn’t need to be the same.