Latency of range queries that return large number of key-value pairs

ntrhieu89 · April 10, 2019, 8:50am

Hi,

I am using FoundationDB for my application. In my use case, the application frequently issues range queries that return thousands, even tens of thousands of key-value pairs.

I notice that there are two ways to execute a range query: 1) using tx.getRange().asList().get() to get all key-value pairs at once or 2) using tx.getRange().iterator() to get the key-value pairs iteratively. Below is my Java code:

Using asList()

private void execNonAsync(Database db, Range range, int expect) throws ExecutionException, InterruptedException {
    Transaction tx = db.createTransaction();
    List<KeyValue> kvs = tx.getRange(range).asList().get();
    for (KeyValue kv: kvs) {
        kv.getKey();
        kv.getValue();
    }
    tx.commit().get();
    tx.close();
}

Using iterator()

private void execAsync(Database db, Range range, int expect) throws ExecutionException, InterruptedException {
    Transaction tx = db.createTransaction();
    AsyncIterator<KeyValue> iter = tx.getRange(range).iterator();
    while (iter.hasNext()) {
        KeyValue kv = iter.next();
        kv.getKey();
        kv.getValue();
    }
    tx.commit().get();
    tx.close();
}

I run a microbenchmark program to measure the latencies of both ways. Below is the result (in milliseconds). My conclusion is asList() provides remarkably lower latency comparing with iterator(). The difference increases when the range size (the number of key-value pairs returned) increases.

Range Size      Using asList()      Using iterator()
1               1.213090            1.241620          
10              1.237670            1.920940
20              1.299140            1.966210
50              1.386290            2.400080
100             1.515270            2.947490
1000            3.316320            6.013990

The results are gathered with 1 one node FDB and another node that runs my benchmark Java program. The database has 100k keys in total (each key is 8-byte length, each value is 100-byte length) and each query is run repeatedly 100k times.

My questions are:

Is it because asList() gets all key-value pairs in one call to FDB and iterator() uses multiple calls (equal to the number of fetched key-value pairs) in an async manner that causes the latency difference? If not, what is the explanation of the higher latency?
I want to go with iterator() for my application because with this way the application does not hold all key-value pairs in memory and it can process the data as soon as some key-value pairs are available. My major concern is the high latency of the current iterator(). Are there some tunings or configurations that reduce its latency? Ideally, I expect the latency of iterator() should not be much lower than asList() when the range size increases.

This is the link for the full code of my benchmark.
http://collabedit.com/xdwbk

Any bits of help/recommendations would be helpful. Thanks!

gaurav · April 10, 2019, 12:15pm

If I recall correctly, asList() issues the range request with option ‘WantAll’, which will prioritize fetching the data in as few server roundtrips as possible. This could explain the difference in time. However, asList() method would require you to have enough memory available to hold all the values in the requested range.

ntrhieu89 · April 10, 2019, 4:54pm

If I uses iterator() to retrieve N keys, the number of roundtrips to the server (not counting the request to get the read version) is N or less? Does iterator() also pre-fetch data? I think the latency should be reduced if iterator() also fetch the data in batch, let’s say, 10 keys at a time for them to be available locally. If it already does so, can this number (10) be configurable somehow?

ajbeamon · April 10, 2019, 5:58pm

The way that range queries work is that they request results in batches from the cluster using a byte limit determined by your streaming mode. The default streaming mode for the iterator is one that starts at a low byte limit and gradually increases (called ITERATOR), while calling asList uses WANT_ALL or EXACT, depending on whether you specify a row limit. If it has to do enough requests, the iterator mode would eventually reach the same byte limit as WANT_ALL, so a sufficiently large range should hopefully take a similar amount of time between the two queries in relative terms.

If you use the WANT_ALL streaming mode in your iterator example, I would expect the times to be similar.

ntrhieu89 · April 11, 2019, 5:16pm

I tried with the stream mode and you are right, the latency dropped when I used a different StreamMode than ITERATOR. Thank you very much!

Do you have more details on the differences between these stream modes? I traced the Java code but it seems that the parameter is passed to be processed in the C library. For example:

What is the fetch chunk size of each stream modes (SMALL, MEDIUM, LARGE, SERIAL, ITERATOR)?
What are the differences between LARGE and SERIAL?
If I select WANT_ALL, does it get all data in one call or does it still fetches data in chunks? If it still fetches data in chunks, is the chunk size larger than LARGE mode?

ajbeamon · April 11, 2019, 6:26pm

You can see the values in this array, with the iterator progression right below it:

github.com

apple/foundationdb/blob/cd5c9d91fad2c52066f1aa25d7a87672da316408/bindings/c/fdb_c.cpp#L460


		limit = CLIENT_KNOBS->ROW_LIMIT_UNLIMITED;
	if (!target_bytes)
		target_bytes = CLIENT_KNOBS->BYTE_LIMIT_UNLIMITED;


	/* Unlimited/unlimited with mode _EXACT isn't permitted */
	if (limit == CLIENT_KNOBS->ROW_LIMIT_UNLIMITED && target_bytes == CLIENT_KNOBS->BYTE_LIMIT_UNLIMITED && mode == FDB_STREAMING_MODE_EXACT)
		return TSAV_ERROR(Standalone<RangeResultRef>, exact_mode_without_limits);


	/* _ITERATOR mode maps to one of the known streaming modes
	   depending on iteration */
	static const int mode_bytes_array[] = {CLIENT_KNOBS->BYTE_LIMIT_UNLIMITED, 256, 1000, 4096, 80000};


	/* The progression used for FDB_STREAMING_MODE_ITERATOR.
	   Goes from small -> medium -> large.  Then 1.5 * previous until serial. */
	static const int iteration_progression[] = { 256, 1000, 4096, 6144, 9216, 13824, 20736, 31104, 46656, 69984, 80000 };


	/* length(iteration_progression) */
	static const int max_iteration = sizeof(iteration_progression) / sizeof(int);


	if(mode == FDB_STREAMING_MODE_WANT_ALL)
		mode = FDB_STREAMING_MODE_SERIAL;

Small is 256, medium is 1000, large is 4096, and serial/want_all are 80000.

Only exact does the query in one go.

ntrhieu89 · April 11, 2019, 6:42pm

@ajbeamon Thank you so much! I will play around with these parameters and share with you if I have some interesting results.

alexmiller · April 11, 2019, 9:52pm

I had also filed Add to the Java docs a .asList() vs .iterator() comparison because you’re not the first person I’ve seen be confused by this and benchmark it. If you do enough playing around to have all the content that we should add to the docs on this subject anyway, it’d be wonderful for that to turn into a PR.

alloc · April 11, 2019, 11:36pm

I guess the other thing to bench mark would be time to first byte. In theory, one should get better performance if one needs to, for example, pipeline asynchronous tasks (one per result back from the get range query) if one uses the iterator than if one uses asList.

swatej · June 13, 2023, 6:07pm

Can we manual change the byte size instead of using STREAMING MODES? Let’s say instead of streaming mode=small, I want chunk size to be 9000 bytes. Can I do that?

Topic		Replies	Views
Scanning a large range with Locality API hangs Using FoundationDB	12	1952	August 12, 2020
FDB C API and range retrieval has duplicates? Development bindings	2	151	May 8, 2024
Is That Possible to Force Prefetching on AsyncIterators at FDB Client? Using FoundationDB bindings , performance	2	733	November 16, 2020
Why can I only range read 2857 keys? Using FoundationDB	1	630	July 13, 2019
FDB_STREAMING_MODE_ITERATOR ignores iteration Using FoundationDB	3	1160	October 25, 2023

Latency of range queries that return large number of key-value pairs

Related topics