How to Fetch all stored key-values at once

fdbsummit

(Sakthivel) #1

Hi, I’m new to FDB and I have written c client to insert key-value pairs to DB and able to retrieve the values for specific keys.

I have used “fdb_transaction_get_range” but it fetches keys which are similar. if keys are unique, I have to feed each key and get respective value.

But, I’m looking for a way to get all stored key-value pairs at once. Is there a method available in FDB?

Thanks for your help in advance.


(A.J. Beamon) #2

I’m not sure I exactly understand the problem you are running into, but let me give you some details about how range reading works and maybe that will help to clear things up.

To answer your primary question, you can use fdb_transaction_get_range to read an arbitrary range with some caveats. The first is that a transaction’s lifetime is limited to 5 seconds, so if your range is too large to be read in that amount of time, you won’t be able to read it all at once. Second, the C API is lower level than what we provide in other bindings, and fdb_transaction_get_range may return early depending on your streaming mode. The expectation is that you will call it again with updated parameters if you need more of the range.

To issue a range read for the entire database, you would read from the empty string "" to \xff like so:

int rowLimit = 1000; // 0 for unlimited
int byteLimit = 100000; // 0 for unlimited; the byte limit may be reduced by the streaming mode
int iteration = 1; // used by FDB_STREAMING_MODE_ITERATOR, see discussion below
fdb_bool_t snapshot = 0;
fdb_bool_t reverse = 0;

FDBFuture* f = fdb_transaction_get_range(tr, FDB_KEYSEL_FIRST_GREATER_OR_EQUAL("", 0), 
                                         FDB_KEYSEL_FIRST_GREATER_OR_EQUAL("\xff", 1), 
                                         rowLimit, byteLimit, FDB_STREAMING_MODE_ITERATOR, 
                                         iteration, snapshot, reverse);

When you get the result from the future f using fdb_future_get_keyvalue_array, there is an output parameter named out_more that will be set to true if the query failed to exhaust the range. This could happen if you set a row or byte limit that gets reached, if the streaming mode you use sets a byte limit that gets reached, or if the range you are reading spans multiple shards and would require you to make a request to more storage servers to get the rest of the data. When this happens, if you want to continue reading the range you will need to reissue your query with a new begin key (or end key, if in reverse) as well as update your explicit limits if you had any. You should also increment the iteration variable if you are using FDB_STREAMING_MODE_ITERATOR because this streaming mode increases the byte limit each time you call it, up to a maximum.

For example, if your first attempt to read the range above returns 100 keys totaling 10000 bytes with the last result having the key foo, you could reissue the request as follows:

int rowLimit = 900; // Subtract the 100 results we got in the first query
int byteLimit = 90000; // Subtract the 10000 bytes we got in the first query
int iteration = 2; // increment the iteration
fdb_bool_t snapshot = 0;
fdb_bool_t reverse = 0;

// Notice that the begin key is now the first key after "foo"
FDBFuture* f = fdb_transaction_get_range(tr, FDB_KEYSEL_FIRST_GREATER_THAN("foo", 3), 
                                         FDB_KEYSEL_FIRST_GREATER_OR_EQUAL("\xff", 1), 
                                         rowLimit, byteLimit, FDB_STREAMING_MODE_ITERATOR, 
                                         iteration, snapshot, reverse);

For the most part, the various streaming modes adjust the byte limit of your requests as documented here. There is a slightly different streaming mode named FDB_STREAMING_MODE_EXACT which will always return all of your request in a single batch (i.e. out_more will be false). To use it, you are required to specify a nonzero row or byte limit. Also, if you use it on a range that’s too large to read in a 5 second window, you will fail with an error without getting any results.

If you need to read a range that’s too large to read in 5 seconds, then the best you can do right now is to split the read up over multiple transactions. In this case, the range you read may not be consistent, but depending on your use case that may be ok. To do so, you would read your range in batches (as above), periodically switching to a new transaction. You could do this in response to an error or proactively after your transaction has been open for a sufficiently long amount of time (e.g. 3 seconds). You can see an example of this approach in our higher level bindings’ locality APIs (such as in Python).


(Alex Miller) #3

If this is a question about if there’s a “multi-get()” sort of call, then Multi-gets/sets in client would be relevant.


(Sakthivel) #4

Hi @ajbeamon , Thanks for your valuable response. I’m able to retrieve all stored data from DB now.
Curious to know, any other way to retrieve all DB data’s like just using database name.

Using following API to create database , Is it possible using only following DB name to get stored DB data ?
fdb_cluster_create_database(cluster, “DB1”, 2);


(A.J. Beamon) #5

The only way to get data from the database is to open up a transaction and issue reads. Some of the higher level bindings provide convenience methods to do this, such as in python:

db = fdb.open()
data = db[:] # returns all data in the database

Under the hood this is just creating a transaction and reading the data in a retry loop. You could add your own convenience method in C to do the same given a database object.

Also, just as a sidenote, you cannot create a database with any name besides DB, so DB1 would be invalid. As of 6.1, you won’t be able to specify a database name at all, and you will create database using a new function named fdb_create_database.


(Sakthivel) #6

Thanks @ajbeamon. So, there is no direct method available for C like Python. Have to create own method for that/ Thanks a lot for your response.


(Meng Xu) #7

If the db is huge and reading it is longer than 5 seconds, will the data = db[:] keep failing?