Another API for querying and KeySelector

I am restarting my work on python asyncio foundationdb bindings.

One feedback I get a lot about foundationdb is about how to query the database: “Where is the declarative language”… This is mostly in comparison with SQL databases, even if eventually most people hide the SQL with an API with something like an ORM!

In fact, there is really 6 methods to know about to be able to be productive:

  • get (read)
  • set (create or update)
  • range (read)
  • range_startswith (read)
  • clear (delete)
  • clear_range (delete)

We can add to that get_key but so far I do not use it.

I want to reduce further the API surface in (my) python asyncio bindings and introduce a Transaction.query method that will merge the behavior of get, range and range_startswith together, also clear and clear_range will be a single method.

Here is the signature of Transaction.query:

Transaction.query(selector, other=None, limit=None)

In particular if other.key < selector.key then it means that results should be returned reversed and is equivalent to tr.range(selector, other, reverse=True)

Also, will make tuple.strinc public to be able to query by prefix (or add a prefix keyword argument?)

Similarly, to clear a range or single key there will be:

Transaction.delete(selector, other=None)

Case in point: I think this is more explicit how to query since there is a method called query.

Long story, I struggle with KeySelector:

I will not explain what I do not understand to avoid to add confusion except ask the question:

Why first_greater_or_equal has or_equal == False

Key selectors are used to implement a “cursor” that scans the B-Tree, in lexicographical order, looking for “real” keys that match the selector. Basically it scans existing keys one after the other until the key-selector “matches”, and then applies the specified offset (which can also be negative!)

From the documentation

To “resolve” these key selectors FoundationDB first finds the last key less than the reference key (or equal to the reference key, if the equality flag is true), then moves forward a number of keys equal to the offset (or backwards, if the offset is negative).

So in the case of the first_greater_or_equal (or_equal = false, offset = 1) it tries to find the last key that is NOT equal to what you want, and then advance by one.

For example if the db contains keys A B C D E F and you create a key selector (D, false, 1), it will try to find the last key that is before and NOT equal to D, which in this case is C. Then it will advance by 1 and will return D.

In the case where D does not exist (so db contains the keys A B C E F), it will still end up on key C, but when it advance by 1 key, it will return E.

If you look at first_greater_than it uses or_equal = true instead of false, but still with offset 1: In the case where key D exists, then the cursor will stop on D (which is the last key that is <= D), but still advance by 1 and end up on E. If D does not exist, then the cursor will stop on C, advance by 1 and end up on E again.

Finally, when thinking about end selectors for get_range there is an extra step: the end selector also selects a key (using the same logic as the begin key), but the range operator STOPS READING WHEN IT REACHES THAT KEY. So when you are dealing with the end selector, your job is to be able to select a key that is AFTER the last key you want in your range. This explains why begin selectors usually are some variation of first_great_or_equal, while the end selectors are usually a variation of first_greater_than. You could be tempted to use a last_less_than or last_less_or_equal for the end selector but then you may select the last key of your range by mistake, and the get_range would NOT return this key.

1 Like

One more thing: get(key) can be emulated with a get_range using begin selector first_greater_or_equal(key), and end selector first_greater_than(key). It will return either an empty range, or a range with a single result, depending on if the key exists or not.

Another variant is to use first_greater_or_equal(key + '\0') as the end selector, but it requires an extra memory allocation.

So you could have helpers that create a query that matches a single key that way.

1 Like

Thanks. I figured why I was confused: getRange(start, end) documentation is clear:

Database. get_range ( begin , end [, limit , reverse , streaming_mode ])

Returns all keys k such that begin <= k < end and their associated values as a list of KeyValue objects. Note the exclusion of end from the range. This read is fully synchronous.

I always forget that end is excluded.

Going through a REPL makes it clear what the KeySelector does:


In [2]: import fdb

In [3]: import fdb.tuple

In [4]: fdb.api_version(620)

In [5]: db = fdb.open()

In [7]: for byte in b'ABCDEF':
   ...:     db[bytes([byte])] = bytes([byte])
   ...: 
   ...: 

In [8]: db.get_key(fdb.KeySelector.last_less_than(b'C'))
Out[8]: b'B'

In [9]: db.get_key(fdb.KeySelector.last_less_or_equal(b'C'))
Out[9]: b'C'

In [10]: db.get_key(fdb.KeySelector.first_greater_or_equal(b'C'))
Out[10]: b'C'

In [11]: db.get_key(fdb.KeySelector.first_greater_than(b'C'))
Out[11]: b'D'