Hi:
I have an application that uses FDB Java Binding. The application makes an array of range queries to FDB. The current implementation is to have each range query to return as an AsyncIterator. Then the application scan through the array of the iterators, iterator by iterator.
According to the inline documentation of RangeQuery at: https://github.com/apple/foundationdb/blob/master/bindings/java/src/main/com/apple/foundationdb/RangeQuery.java#L37
If the calling program uses an asynchronous paradigm, a non-blocking {@link AsyncIterator} is returned from {@link #iterator()}. Both of these constructions will not begin to query the database until the first call to {@code hasNext()}
The actual query to remote FDB does not start only when the first hasNext() actually happens. While the application is iterating over the array of the iterators, it would be great if the iterators can already start to fetch query results, thus pre-fetching.
To achieve pre-fetching, the proposal is the following. The application can issue the following calls, right after the return of the array of the iterators, but before actually using the iterators, via:
for (AsycIterator iter: CollectedIterators) {
iter.hasNext();
}
Each hasNext() call then forces query invocation to FDB and return the first chunk of the query result.
So my question is whether the above pre-fetching proposal makes sense from performance improvement point of view, more specially:
(1) Does the hasNext() call on all of these iterators make use of the FDB Java binding managed thread pool, thus we can achieve concurrent pre-fetching of the iterators, up to the number of the threads allocated at the thread pool?
(2) Since we use StreamingMode.WANT_ALL, the first chunk of up to 8196 bytes will be returned from the Iterator hasNext() call. As most of our query results will not be larger than 8KB, so the first pre-fetching mostly will get back all of the results that the application wants. Is this correct?
(3) Eventually, if the application issues a large number of the AsyncIterators, the bottleneck will be at the FDB Java Binding’s network thread, which is just a single thread. Is this correct? If so, is there some way to mitigate this single thread problem, to drive more query throughputs from the FDB client?
Thanks!