Foundation DB Go Lang Pagination


(Ravilution) #1

Hi, how do I go pagination in FDB with Go Lang. Here is a simple example I want to search users by keyword. My keys in FDB might be like this /search/user/<keyword>/<username>. Imagine /search/user/donald/realDonaldTrump, /search/user/trump/realDonaldTrump as some of the sample keys. Let me know how to do pagination search by keyword. Keywords while searching can be full or partial. So for example, it can be tru, trum, trump.

Edit 1
Based on the FDB Known Limitation let us assume we are going to achieve pagination using GetRange(Begin, Limit). We are not going to use End. For my first page result, I would do something like

Note: Not real code. Just representation.

/* assume there are /search/user/trump/realDonaldTrump1 to /search/user/trump/realDonaldTrump20 */
beginKey := '/search/user/trump/'
fdb.SelectorRange{Begin: fdb.FirstGreaterThan(beginKey), Limit: 10}

the last key for the first result page will be /search/user/trump/realDonaldTrump10. What is the beginKey for the next page? The entire number 10 key wont work. if I use this /search/user/trump/ I will get the first page again.


(Alec Grieser) #2

I think there are essentially two questions embedded in here: (1) how to search by keyword and (2) how to paginate. I’ll tackle those in order.

To search by keyword, the tricky part here appears to be the “prefix” part of the search, if I’m understanding correctly. You should be able to use the Tuple and Subspace layers to serialize the keys to the database in the correct order, and then it’s a matter of finessing the keys to read correctly. I think if your inserts are something like:

userSubspace := subspace.Sub("system", "user")
func (tr fdb.Transaction) WriteUser(username string, keyword string) {
    key := userSubspace.Pack(tuple.Tuple{keyword, username})
    tr.Set(key, []byte{})
}

Then reading is something like:

func (tr fdb.Transaction) ReadKeywords(keywordPrefix string) ([]string, err) {
   keywordBytes := userSubspace.Pack(tuple.Tuple{keywordPrefix})
   rangeStart := keywordBytes[:len(keywordBytes) - 1]
   rangeEnd := fdb.Strinc(rangeStart)
   fdbRange := fdb.KeyRange{Begin: rangeStart, End: rangeEnd}
   // then read the range (as discussed below)
}

For the pagination part, I think you could do something like:

func (tr fdb.ReadTransaction) ReadPaginatedRange(fdbRange fdb.Range, continuation fdb.KeyConvertible, limit int) ([]fdb.KeyValue, fdb.KeyConvertible, error) {
    options := fdb.RangeOptions(Limit: limit)
    var effectiveRange fdb.Range
    if continuation == nil {
        effectiveRange = fdbRange
    } else {
        _, end := fdbRange.FDBRangeKeySelectors()
        effectiveRange = fdb.SelectorRange{Begin: fdb.FirstGreaterThan(continuation), End: end}
    }

    rr := tr.getRange(effectiveRange, options)
    kvs, err := rr.GetSliceWithError()
    if err != nil {
        return nil, nil, err
    }

    if len(kvs) < limit {
        // Did not return all results, so no more data. Return nil as the continuation.
        return kvs, nil, nil
    } else {
        // Hit limit. Return the last key in the slice as the last read key.
        return kvs, kvs[len(kvs) - 1].Key, nil
    }
}

My go is a little rusty, but I think this is essentially what you want. Modulo any compiliation errors (disclaimer–I haven’t tried to compile this), the idea here is that it will read some number of keys and then return a 3-tuple containing: the data, a “continuation” key, and an error. The error just propagates any error encountered from FDB; the continuation is then used to resume the get range, with a nil continuation passed into the function meaning “resume from the beginning” and a nil continuation coming out of the function meaning, “there is no more data”. So then you would call this with something like:

func (db fdb.Database) ReadPaginatedRange(range fdb.Range, limit int) ([]fdb.KeyValue, error) {
    var continuation fdb.KeyConvertible = nil
    var results []fdb.KeyValue = []
    var done bool = false
    for !done {
        _, err := db.ReadTransact(func(tr fdb.ReadTransaction) (interface{}, error) {
            kvs, continuationInner, eInner := tr.ReadPaginatedRange(range, limit, continuation)
            if eInner != nil {
                 return nil, eInner
            }
            continuation = continuationInner
            results = append(results, kvs...) // there's probably a better way to do this that doesn't have O(n^2) runtime
            return nil, nil
        })
        done = continuation == nil
    }
    return results
}

(Again, sorry, my go is a little rusty.) But the idea is that a different transaction is used for each loop; retriable errors encountered while paginating (like transient network failures or an fdb recovery or a transaction hitting the 5 second limit) are retried by the ReadTransact call, but successful reads advance the continuation until the results slice has all of the results from all of the transactions. One still has to be at least a little careful to make sure that one doesn’t read too much data from FDB and OOM one’s process, but alas.

Something like that.


(Ravilution) #3

Hi @alloc, thank you for the response. Can you check the edit I have made to the question. I am confused about the continuation key you are mentioning. That is were I am stuck.


(Ravilution) #4

@alexmiller can you help me?


(gaurav) #5

Hi @ravilution, I believe your confusion is regarding how to get the next key to start your follow-up paginated request; such a key can be formed by appending a single byte - 0x00 to the last key (byte[]) obtained in the previous batch.

Probably something like this? I have implemented a helper method in Java to do the same (you can do something like this in Go):

     * Returns the immediate next key that would sort after the given key.
     * Just appends 0x00 to the passed byte[]
     */
    public static byte[] keyAfter(final byte[] key) {
        final byte[] n = new byte[key.length + 1];
        System.arraycopy(key, 0, n, 0, key.length);
        n[key.length] = 0x0;
        return n;
    }

(Ravilution) #6

@gaurav I am unable to explain my confusion in the best way. Let me try again. Image I have below data in FDB

\x15\x12\x02search\x00\x02user\x00\x02trump\x00\x02realdonaldtrump1\x00 '' 
\x15\x12\x02search\x00\x02user\x00\x02trump\x00\x02realdonaldtrump2\x00 '' 
\x15\x12\x02search\x00\x02user\x00\x02trump\x00\x02realdonaldtrump3\x00 ''

When the user searches with the keyword "trump" I will form a key like this \x15\x12\x02search\x00\x02user\x00\x02trump. For the sake of this discussion, imagine LIMIT is 1.

The last key of the first page result will be x15\x12\x02search\x00\x02user\x00\x02trump\x00\x02realdonaldtrump1\x00.

Now what is should be my key for next page result to get keys and values for x15\x12\x02search\x00\x02user\x00\x02trump\x00\x02realdonaldtrump2\x00.

  1. If I use the last key completely (x15\x12\x02search\x00\x02user\x00\x02trump\x00\x02realdonaldtrump1\x00), I will get no result.

  2. If I trim it to x15\x12\x02search\x00\x02user\x00\x02trump, I will get the same old first page as a result.


(gaurav) #7

If you use the above key as the start key of the next request, you should get back all the three rows. Why are you suggesting that using this as start-key will result in no result?

If you use the code snippet I posted, you would get back all rows AFTER the last row you got in the previous page.


(Ravilution) #8

@gaurav Thank you for the quick response. No disrespect to you. Trying to understand the fundamentals. Below is what I get when I run in fdbcli

fdb> getrange \x15\x12\x02search\x00\x02user\x00\x02trump\x00\x02realdonaldtrump2\x00

Range limited to 25 keys
`\x15\x12\x02search\x00\x02user\x00\x02trump\x00\x02realdonaldtrump2\x00' is `'

(gaurav) #9

Please try the full syntax for getrange Usage: getrange <BEGINKEY> [ENDKEY] [LIMIT]. I am not sure what is the semantics for using this command with only begin key:

fdb> getrange \x15\x12\x02search\x00\x02user\x00\x02trump\x00\x02realdonaldtrump1\x00 \xFF 3

Range limited to 3 keys
`\x15\x12\x02search\x00\x02user\x00\x02trump\x00\x02realdonaldtrump1\x00' is `'1''
`\x15\x12\x02search\x00\x02user\x00\x02trump\x00\x02realdonaldtrump2\x00' is `'2''
`\x15\x12\x02search\x00\x02user\x00\x02trump\x00\x02realdonaldtrump3\x00' is `'3''
fdb> getrange \x15\x12\x02search\x00\x02user\x00\x02trump\x00\x02realdonaldtrump2\x00 \xFF 2

Range limited to 2 keys
`\x15\x12\x02search\x00\x02user\x00\x02trump\x00\x02realdonaldtrump2\x00' is `'2''
`\x15\x12\x02search\x00\x02user\x00\x02trump\x00\x02realdonaldtrump3\x00' is `'3''

With extra \x00 appended (keyAfter)

fdb> getrange \x15\x12\x02search\x00\x02user\x00\x02trump\x00\x02realdonaldtrump1\x00\x00 \xFF 3

Range limited to 3 keys
`\x15\x12\x02search\x00\x02user\x00\x02trump\x00\x02realdonaldtrump2\x00' is `'2''
`\x15\x12\x02search\x00\x02user\x00\x02trump\x00\x02realdonaldtrump3\x00' is `'3''
`\x15\x12\x14\x15\x03\x15\x01\x15\x01\x1a\x01h\xb7\x18\xfa\xe6\x15~\x1a\x01h\xb7\x18\xfa\x82' is `'

(gaurav) #10

fdb> help getrange
getrange [ENDKEY] [LIMIT]

Fetch key/value pairs in a range of keys.

Displays up to LIMIT keys and values for keys between BEGINKEY (inclusive) and
ENDKEY (exclusive). If ENDKEY is omitted, then the range will include all keys
starting with BEGINKEY. LIMIT defaults to 25 if omitted.

For information on escaping keys, type `help escaping’.

Apparently, when you omit ENDKEY, then getrange is looking for all keys starting with the BEGINKEY.


(Ravilution) #11
getrange \x15\x12\x02search\x00\x02user\x00\x02trump\x00\x02realdonaldtrump1\x00 \x15\x12\x02search\x00\x02user\x00\x02trump\xff 1

The above command works. Great! However, what are your thoughts on FDB Known Limitation


(gaurav) #12

Anything specific to it that you want to inquire?

I think this limitations is roughly saying that offset will need time/disk-access linear to size of offset used. Since, it will need to skip offset number of keys after finding the pivot key.

If you can describe the usecase you have in mind to use large offsets, then someone here should be able to suggest an optimal approach.

Just to make sure - this offset limitation is unrelated to “searching” the BEGIN and END keys in getrange operations (for operations you described above).


(A.J. Beamon) #13

This works well as he described, but another option if you are using key selectors in your range query is to use the last key of your previous result in a FirstGreaterThan key selector, as Alec did in this line:

effectiveRange = fdb.SelectorRange{Begin: fdb.FirstGreaterThan(continuation), End: end}

(Ravilution) #14

@gaurav what is an offset? In my case, I will potentially have millions of records where I search users by keyword. In our above example, it is possible I have thousands of usernames tagged to trump keyword. My front end will only make a request for 15 to 20 records at a time. Rest as the user scrolls in the front end. So, imagine I am on the last page of the result whose total is in thousands. Does that make the offset big and slow?


(gaurav) #15

No, it does not make your offsets big. Since you are always giving a precise key (or keyprefix) to start the retrieval (both at the time of first search, as well as in the continuation pages), you will not have large offsets.

Offsets are used to jump forward/backward n positions from a given search key.
I have not come across a good use case myself for using large offsets - probably one usecase could be to directly get the nth person on the leader board after/before a given person…