Large Range Scans - avoid 5s limit

gaurav · June 18, 2020, 11:52am

forEach ensures a serial processing of keys in a strongly monotonic order.

PS: read AJ’s comment as well. In order for this snippet to work you will have to disable backoff for transaction_too_old error on transaction retry. Otherwise you will see unnecessary delays between transaction continuations.

(edited to remove confusing statements)

gaurav · June 18, 2020, 1:36pm

@osamarin I checked the code and there is no forEachAsync method. There is only forEach which is the async method (i.e. returns future). I got mixed up in my thoughts. There are multiple versions of forEach - one that takes an executor as a param, vs one that uses a default executor.

Apologies for confusion.

github.com

apple/foundationdb/blob/4f6054a3e6ff9321de3ff26b39b6151ada83e725/bindings/java/src/main/com/apple/foundationdb/async/AsyncUtil.java#L95


	 *  or by the consumer provided or with {@code null} if the future completes
	 *  successfully. Items are processed in order from the iterable, and each item
	 *  will be processed only after the item before it has finished processing.
	 *
	 * @param iterable the source of data over from which to consume
	 * @param consumer operation to apply to each item
	 * @param <V> type of the items returned by the iterable
	 *
	 * @return a future that is ready once the asynchronous operation completes
	 */
	public static <V> CompletableFuture<Void> forEach(final AsyncIterable<V> iterable, final Consumer<? super V> consumer) {
		return forEachRemaining(iterable.iterator(), consumer);
	}

	/**
	 * Run the {@code consumer} on each element of the iterable in order. The future will
	 *  complete with either the first error encountered by either the iterable itself
	 *  or by the consumer provided or with {@code null} if the future completes
	 *  successfully. Items are processed in order from the iterable, and each item
	 *  will be processed only after the item before it has finished processing. Asynchronous
	 *  tasks needed to complete this operation are scheduled on the provided executor.

osamarin · June 18, 2020, 4:03pm

Thanks. I read Java: concurrency questions where they suggested to use forEachAsync. It was a missprint.

dongxinEric · June 18, 2020, 9:31pm

You might be interested in this issue: https://github.com/apple/foundationdb/issues/3391. Basically this API allows you to split your target large range into similar sized chunks and then you can scan them in parallel on a bunch of worker processes.

pyr · March 18, 2021, 2:35pm

I have successfully used this technique for a large scan, thanks for the pointer.

I was wondering what the difference was between this and an iteration across several chunks with a limit on the range (where you keep going as long as the number of returned elements matches the limit).

pyr · March 18, 2021, 2:36pm

Do you have any idea of when this API will become available for general use?

gaurav · March 18, 2021, 3:12pm

I am unable to understand what is the second approach that you described. Could you elaborate on it, maybe with pseudo code?

dongxinEric · March 18, 2021, 4:31pm

The API is already merged into master branch so I think it is targeting 7.0 release. As in when 7.0 will be released, I believe it’s a question for @mengxu @Evan @markus.pilman @jzhou

markus.pilman · March 18, 2021, 4:57pm

We are planning on increasing the frequency of releases (6.2 is pretty old and 6.3 is now about finished). The 7.0 release branch will be cut 1st of May. We will then still need some time to make everything stable but hopefully we can do this quickly (we make major investments into additional testing infrastructure and automation). So an optimistic release date would be beginning of June.

PierreZ · July 1, 2021, 8:19am

Hi Markus, any news about the 7.0 release? Is there a place where I can follow the work left?

markus.pilman · July 1, 2021, 3:25pm

Sadly there’s currently not a lot of transparency when it comes to this stabilization process. As I expected June 1st was way too optimistic, but we’re getting closer to a first release.

We probably will tag a release soon (in the following weeks). A tag just means that our testing (performance and correctness) looks good – and this is usually the point in time where we start pushing the release to our QA and pre-production clusters.

But we (or rather Apple) typically publish a release only after we upgraded some production clusters either at Snowflake or at Apple and I can’t publish timelines for that.

If you have QA deployments where you want to test FDB you can start upgrading those hopefully very soon.

PierreZ · July 1, 2021, 9:48pm

No worries, I was just wondering if I could help on some (beginners) issues.

Looking forward to play with it

mping · September 12, 2022, 4:37pm

I think there’s a small bug, seen calls get twice when using nextKey, but the value should be read only once, otherwise it can yield two different values.

gaurav · September 12, 2022, 4:56pm

It is just an atomic reference that will return same value unless updated to something different.

mping · September 12, 2022, 5:09pm

Yes you are correct, in this case the value doesn’t need to be the same.

Topic		Replies	Views
Range Reads greater and 5 sec and 10000000 bytes Using FoundationDB	4	1501	March 22, 2022
Scanning a large range with Locality API hangs Using FoundationDB	12	1951	August 12, 2020
Streaming data out of FoundationDB Using FoundationDB	2	2608	September 11, 2018
Why can I only range read 2857 keys? Using FoundationDB	1	626	July 13, 2019
How Record Layer Provides APIs to Handle Large Range Scan Longer Than 5 seconds Record Layer bindings	6	1783	December 16, 2020

Large Range Scans - avoid 5s limit

Related topics