Proposal for a `fdb.range_near(key, limit)`

amirouche · November 10, 2019, 7:11pm

While working on approximate string matching (https://stackoverflow.com/q/58065020/140837) I stumbled upon the need to lookup the nearest keys around an input key. Hence the idea to have a fdb.range_near(key, limit).

A workaround could be the query by prefixes of key of decreasing length until reaching limit found keys or the empty prefix. That seems like a waste because results of db.range_prefix(key[:len(key) - n] are included in db.range_prefix(key[:len(key) - n - 1] (where n is strictly bigger than the length of the associated subspace prefix)

Maybe RankedSet will be a better solution?

alexmiller · November 10, 2019, 7:14pm

Could you explain precisely what you would wish the semantics of range_near to be?

amirouche · November 10, 2019, 8:35pm

I wish to be able to fetch at most LIMIT key-value pairs around KEY. For instance,
given the following database:

+=========+=========+
|   key   |  value  |
+=========+=========+
| b'\x00' |  "foo"  |
+---------+---------+
| b'\x01' |  "abc"  |
+---------+---------+
| b'\x02' |  "qux"  |
+---------+---------+
| b'\x03' |  "bar"  |
+---------+---------+
| b'\x04' |  "baz"  |
+=========+=========+

I would like, the following python code:

db.range_near(b'\x02', limit=3)

To return the following key-value pairs:

+=========+=========+
|   key   |  value  |
+=========+=========+
| b'\x01' |  "abc"  |
+---------+---------+
| b'\x02' |  "qux"  |
+---------+---------+
| b'\x03' |  "bar"  |
+=========+=========+

gaurav · November 11, 2019, 5:52am

Would something like key-selectors help? For your previous example, we could fetch n/2 keys above and n/2 below the pivot key and then do another call to make up for any shortfall.

amirouche · November 11, 2019, 7:50am

Yes, indeed key selectors can help.

Thanks.

amirouche · November 11, 2019, 4:34pm

That will not necessarily be the nearest keys. To be sure, that the returned keys are the nearest, one will need to fetch LIMIT keys above and LIMIT keys below, and compute the nearest keys. That is not bad. In my case, LIMIT is not more than 10.

(By the way, I am still not sure about the “approximate string matching” algorithm I wrote about in the original post).

Thanks!

gaurav · November 11, 2019, 4:59pm

Yes, if there is a notion of ‘nearness’ then one would need to fetch n keys on each side. I was trying to fetch n keys ‘around’ the pivot key, as mentioned in the example. I will read the original post more carefully and see if there is anything possible for it.

Topic		Replies	Views
Last less than or last less or equal with limit added to keyselector returns different values Using FoundationDB	2	1006	May 10, 2018
How to get exact range of keys using fdb_transaction_get_range in C Programming Using FoundationDB bindings	4	1540	May 20, 2019
C API: How to get keys with matching prefix? Using FoundationDB	2	1006	June 13, 2019
Limiting the cardinality of a key range Using FoundationDB	1	1246	August 27, 2018
Use KeySelector.firstGreaterThan or just skip one KeyValue during iteration? Using FoundationDB	13	1917	May 12, 2018

Proposal for a `fdb.range_near(key, limit)`

Related topics