FoundationDB

Implementing transactional getItems() in python -> getting error 2017


(Roman) #1

Hello,

I am trying to simply iterate through a subspace items and read them as Key-Value pair.

I have created a directory and a subspace for it that holds all the relevant information about my data structure.
I have already populated, and I am trying to simply iterate through it using Transaction.get_range(start, end) in Python.

Here is a snippet of my getItems() code:
@fdb.transactional
def GETItems(self, tr, subspace):
range = subspace.range()
start = range.start
stop = range.stop
for packedKey, packedValue in tr.get_range(start, stop):
key = subspace.unpack(packedKey)
valueTup = fdb.tuple.unpack(packedValue)
yield ((key, valueTup,))

I get the following trace at execution:

main()

line 61, in main
for item in storeConnector.getItems(subspaceName):
StoreConnectorClasses.py", line 188, in GETItems
for packedKey, packedValue in tr.get_range(start, stop):
/fdb/impl.py", line 372, in iter
(kvs, count, more) = future.wait()
/fdb/impl.py", line 691, in wait
self.capi.fdb_future_get_keyvalue_array(self.fpointer, ctypes.byref(kvs), ctypes.byref(count), ctypes.byref(more))
/fdb/impl.py", line 1204, in check_error_code
raise FDBError(code)
fdb.impl.FDBError: b’Operation issued while a commit was outstanding’ (2017)

If I don’t use a transaction and use a database object, this doesn’t happen. I am very puzzled about this error.

What would be the most efficient way to iterate over a subspace as an iterator?

Many thanks in advance!


(Alex Miller) #2

@fdb.transactional is incompatible with generators. I have a patch from some time ago to actually make this an error, that I never landed because the binding tester was broken at the time. I’ll try to actually commit that sometime soon. You’ll need to return a list instead of a generator.

@fdb.transactional produces a function that calls the wrapped one and then calls commit. When you yield, you return from the function, commit is called, and then a subsequent invocation of the generator tries to read from the database as part of iterating through the result of get_range. That read fails, because the @fdb.transactional wrapper already called commit on your transaction.


(Roman) #3

Thank you for the information.

The message had me puzzled, but looking at the implementation of the @fdb.trandsactional after your explanation makes perfect sense. I guess it won’t do much even if I try to resuse code of @fdb.transactional without the commit.

Is there a way I can return an key-value iterator? I didn’t find one. Database seems to be returning the full list. I really would like to have an iterator in one particular scenario: I have about 1 million records in the key subspace, and the transaction that I used became stale. I might be wrong, I would need to run the test again to double check if the transaction did .

I can most likely write my own method to implement a retry look and keep the last key I got so I get the following key if anything goes wrong. The fact that previous iterator is not fully consumed shouldn’t cause an issue.

PS I guess I could try and set the timeout for a transaction to infinite, but I still would need a retry mechanism.


(A.J. Beamon) #4

Calling db.get_range() does return a list, if that’s what you’re referring to, because the lifetime of the transaction created doesn’t extend beyond the function call. With tr.get_range(), you should instead get an iterator. Under the hood, it will be executing multiple range queries to the database as you advance through that iterator, so it won’t need to fetch all of the results in one shot.

As you’ve discovered, you can’t have your transactional function be a generator. To work around this, there are two basic approaches. All of your reads need to happen within the lifetime of the transaction, so you either need to collect the results that you want inside the transaction and then return them to the code that’s going to process them, or you need to move the processing code inside the transaction. You suggested that you’ll keep track of your last key and start your retry after that, which works well with the latter approach of trying to process your data within the transaction. You may not even need a custom retry loop to do that.

And also just in case it isn’t obvious, you can use a generator function within your transaction so long as that function isn’t transactional. Instead, you could call it from a transactional function. For example:

fdb.api_version(600)
db=fdb.open()

class KeyCounter(object):
    def __init__(self):
        self.num_tries = 0
        self.num_keys = 0
        self.num_matches = 0
        self.start_key = ''

    def get_first_element_from_values(self, tr):
        for k,v in tr.get_range(self.start_key, '\xff'):
            yield (k, fdb.tuple.unpack(v)[0])

    @fdb.transactional
    def count_values_that_start_with(self, tr, desired_first_item):
        self.num_tries += 1

        for k, first_item in self.get_first_element_from_values(tr):
            self.num_keys += 1
            self.start_key = k + '\x00'

            if first_item == desired_first_item:
                self.num_matches += 1

        return self.num_matches

counter = KeyCounter()

print('Matches: %d' % counter.count_values_that_start_with(db, 1))
print('Tries: %d' % counter.num_tries)
print('Keys: %d' % counter.num_keys)

The output of this when I ran it with the 5 million keys I inserted:

Matches: 1000
Tries: 5
Keys: 5000000