Feature Request: Additional atomic operations

How hard it would be to support new additional atomic operations? Few ideas about such operations are bellow:

  1. version stamped field within value subtracted from current version stamp
  2. support atomic operations on keys identified by FDB_KEYSEL_LAST_LESS_THAN/ FDB_KEYSEL_LAST_LESS_OR_EQUAL/ FDB_KEYSEL_FIRST_GREATER_THAN/ FDB_KEYSEL_FIRST_GREATER_OR_EQUAL. I.e. we don’t know exactly the key we want to modify atomically.
  3. add_extend - Similar to atomic add, but it would store the given integer field at the end of a binary (without modifying original location) when overflow of the integer field within the value is detected.
  4. ability to use atomic add for the last integer field within the value (the size is determined by param_length). I.e. it should work when caller doesn’t know the number of integer fields within the value.
  5. support addition with arbitrary precision integers
  6. dedicate a byte within a value to signal errors (integer field overflow/min/max)
  7. extend_max/extend_min store the given integer field at the end of a binary (without modifying original location) if the stored value is greater/lesser than given.
  8. Integer division/reminder
  9. modular arithmetic within a range passed via param_length. I.e. the value should wrap around instead of overflow. for example if we have decimal 2 in the integer field, allocate 3 bit for it and use mod_add to add decimal 15. The result should be decimal 3. No other bits outside of integer field should be affected.
  10. ability to create new key when condition detected (max/min/overflow) which would contain a backup of old value.
  11. population count
  12. bsr/bsl
1 Like

Atomics are relatively easy to implement as long as we know to which storage server the mutation has to go.

Do you have specific use-cases for the list above?

Atomics are relatively easy to implement as long as we know to which storage server the mutation has to go.

How about cases #2 and #10?

Do you have specific use-cases for the list above?

There are few use cases

Key update rate metric.

We update some key atomically and want to measure how often it is updated. The idea is to dedicate one field within the value for a version stamp and another field for total number of updates and use suggested feature #1 (version stamped field within value subtracted from current version stamp). This wouldn’t work alone and might need some further features to get average rate (or rate within last 5 minutes).

Colocation of multiple Invertible Bloom Lookup Table cells in single FDB value

IBLT requires multiple independent add operations of multiple integer fields. The structure supports blind updates. However the problem is integer overflow. The IBLT needs an operation which would wrap around without corrupting adjacent cells (see proposed features #9 and #6).

Automatic page

We are working on inverted index. This would require us to store references to FDB keys in FDB values. If our references are the same size we could store them under single value atomically if we would be able to append at the end (see proposed feature #7). However FDB has limits on the value size. We want to automatically create a next page when we reach that limit (see feature proposal #10).

The data model for this use case is something like

key/0 = ref1,ref2,ref3.ref4,…
key/1 = ref10,ref20,…

We want to automatically create next key when we reach the maximum allowed size value.

Efficient changes feed

FDB doesn’t support watchers for the range (see some related discussions Changefeeds (watching and getting updates on ranges of keys) and Changes feed without hot keys)

At CouchDB we are trying to introduce sharding to replace our first implementation of the feature. There is an opened RFCs for CouchDB project here

We are looking for a way to create and remove shards automatically based on either rate of updates or number of documents. We probably want to double number of shards as needed. The doubling can be achieved using proposed feature #12.

Bloom filter in FDB value

Bloom filter can be updated blindly using XOR operation. However sometimes it is useful to protect a filter from updates if it is overpopulated. We can implement such protection If we would prevent XOR if population count is exceeding certain threshold (see #11).

implementing lifo queue with ability to update head

If the size of items we intend to store in queue is fixed and small we can implement the queue as multiple keys where multiple items are stored in the same key. When we reach the FDB limit on value size we create next key. In order to implement such a scheme using atomic operations we need to be able to extend last key in the range (see #2) also we need to be able to append to the end and create new key when we reach FDB value limit (see #10). In order to update head
we need to read last field from the value of a last key in a range (see #2 and #4).

This use case probably would require a take operation as well which would remove last field from the value. However IRC it cannot be atomic since it returns a part of a value.

As @markus.pilman erfectly summarized, it’s relatively easy to implement is the key is known. I think that are all items you listed except 2 and 10.

Within these, some (like 8 and 12) are relatively similar to the atomic operations we have today, and should be relatively easy to implement. But the others are trickier because they take more than one argument (for example 9) or require some abstractions that FDB does have (for example, 4 needs “field”). It will be interesting to generalize it and make FDB able run arbitrary atomic operation by supplying lambda functions in some way. But that’s not a small change.

I know it’s probably not what you asked (and has probably been considered), but I’d like to mentioned that all requested operations (except 1) can be achieved today by transactions. It might be helpful to implement them with transactions first and see performance impacts. Splitting bloom filters or fields to more keys may be needed to reduce conflicts.

Also, Record Layer has an implementation to handle arbitrary large values by transactions, which might be helpful.