Is That Possible to Force Prefetching on AsyncIterators at FDB Client?

Hi:

I have an application that uses FDB Java Binding. The application makes an array of range queries to FDB. The current implementation is to have each range query to return as an AsyncIterator. Then the application scan through the array of the iterators, iterator by iterator.

According to the inline documentation of RangeQuery at: https://github.com/apple/foundationdb/blob/master/bindings/java/src/main/com/apple/foundationdb/RangeQuery.java#L37

If the calling program uses an asynchronous paradigm, a non-blocking {@link AsyncIterator} is returned from {@link #iterator()}. Both of these constructions will not begin to query the database until the first call to {@code hasNext()}

The actual query to remote FDB does not start only when the first hasNext() actually happens. While the application is iterating over the array of the iterators, it would be great if the iterators can already start to fetch query results, thus pre-fetching.

To achieve pre-fetching, the proposal is the following. The application can issue the following calls, right after the return of the array of the iterators, but before actually using the iterators, via:

for (AsycIterator iter: CollectedIterators) {
        iter.hasNext(); 
}

Each hasNext() call then forces query invocation to FDB and return the first chunk of the query result.

So my question is whether the above pre-fetching proposal makes sense from performance improvement point of view, more specially:

(1) Does the hasNext() call on all of these iterators make use of the FDB Java binding managed thread pool, thus we can achieve concurrent pre-fetching of the iterators, up to the number of the threads allocated at the thread pool?

(2) Since we use StreamingMode.WANT_ALL, the first chunk of up to 8196 bytes will be returned from the Iterator hasNext() call. As most of our query results will not be larger than 8KB, so the first pre-fetching mostly will get back all of the results that the application wants. Is this correct?

(3) Eventually, if the application issues a large number of the AsyncIterators, the bottleneck will be at the FDB Java Binding’s network thread, which is just a single thread. Is this correct? If so, is there some way to mitigate this single thread problem, to drive more query throughputs from the FDB client?

Thanks!

Some of this behavior is controlled with StreamingMode, though if you want to do read ahead I would wrap the AsyncIterator with your own that does those prefetches as you consume the stream. WANT_ALL grabs the entire thing and no prefetching would be necessary. If your streams are that small, WANT_ALL is probably what you want.

Kind of. The way that it works is that:

  1. All network calls actually happen in the C library, which is called from the Java library using a JNI bridge. The C library is single-threaded but it uses a callback-based mechanism to allow for multiple concurrent requests to happen simultaneously. This means that in practice, the single-threadedness is only a problem when the CPU work exceeds that of a single core.
  2. When a future completes, it schedules a callback that will execute in the Java bindings managed thread pool. It’s on this thread pool that things like marshaling data from C to Java happens, and where any callbacks the user sets on futures (using .thenApply or .thenCompose, etc.) will be executed. So there can be as many callbacks executing at any given time as there are threads in the thread pool
  3. The .hasNext method is a blocking method, so if you called iter.hasNext() on each AsyncIterator, that will end up starting the first iterator and blocking, and then when that’s done, it will start the next iterator.

The consequence of that is that the number of outstanding requests to the database is not really linked to the number of threads in the thread pool, as it’s the FDB C client that manages actually talking to the database, and it doesn’t use the Java threads for that purpose. (The thread pool size instead puts an upper limit on the number of concurrent requests that can be marshaled from C to Java at any given time.)

That being said, I think you can achieve a similar result by doing something like:

for (AsyncIterator iter : CollectedIterators) {
    iter.onHasNext();
}

The onHasNext method is the asynchronous version of hasNext, and it will start work on the iterator and give you a future for when it’s done. So the above snippet ensures that all of the iterators have started, but it doesn’t wait for any of them. You can then either collect the returned futures and wait for any/all of them to complete, or you can call .hasNext() or .next() on one iterator, which will block the thread your code is currently in, but the other iterators will still be (pre-)fetching in the background.

tl;dr You can call .onHasNext instead of .hasNext to start the query without blocking, and then you can use that to begin pre-fetching all the iterators in parallel.

Yeah, that sounds right to me. The WANT_ALL semantics will give you a chunk of results, and then the AsyncIterator implementation will iterate over the key-value pairs in that chunk. If your data is typically smaller than that chunk size, then you should only need to make the one fetch.

It’s correct that the network thread is single-threaded, though as mentioned above, it uses a callback-based concurrency model that allows it to execute many tasks concurrently, even if it’s limited to a single core for CPU work. That being said, the CPU can become a bottleneck, particularly for some workloads. At the moment, the simplest solution for that is to run multiple clients (where each client is 1 process, i.e., one JVM, in this case). (So for example, rather than running one process with a large thread pool, to better soak up all the available CPU, you might consider something like running multiple processes with smaller thread pools.) However, that does imply that your application can then effectively load balance between these multiple database clients, which may or may not be simple

Another problem that sometimes manifests as poor throughput is insufficient parallelism (e.g., blocking on futures instead of firing off multiple futures at once), and for that, doing things like the pre-fetch proposal you’re asking about here is the right solution. More on that in the developer guide: https://apple.github.io/foundationdb/developer-guide.html#throughput-requires-concurrency