Foundation DB Go Lang Pagination

Hi, how do I go pagination in FDB with Go Lang. Here is a simple example I want to search users by keyword. My keys in FDB might be like this /search/user/<keyword>/<username>. Imagine /search/user/donald/realDonaldTrump, /search/user/trump/realDonaldTrump as some of the sample keys. Let me know how to do pagination search by keyword. Keywords while searching can be full or partial. So for example, it can be tru, trum, trump.

Edit 1
Based on the FDB Known Limitation let us assume we are going to achieve pagination using GetRange(Begin, Limit). We are not going to use End. For my first page result, I would do something like

Note: Not real code. Just representation.

/* assume there are /search/user/trump/realDonaldTrump1 to /search/user/trump/realDonaldTrump20 */
beginKey := '/search/user/trump/'
fdb.SelectorRange{Begin: fdb.FirstGreaterThan(beginKey), Limit: 10}

the last key for the first result page will be /search/user/trump/realDonaldTrump10. What is the beginKey for the next page? The entire number 10 key wont work. if I use this /search/user/trump/ I will get the first page again.

I think there are essentially two questions embedded in here: (1) how to search by keyword and (2) how to paginate. I’ll tackle those in order.

To search by keyword, the tricky part here appears to be the “prefix” part of the search, if I’m understanding correctly. You should be able to use the Tuple and Subspace layers to serialize the keys to the database in the correct order, and then it’s a matter of finessing the keys to read correctly. I think if your inserts are something like:

userSubspace := subspace.Sub("system", "user")
func (tr fdb.Transaction) WriteUser(username string, keyword string) {
    key := userSubspace.Pack(tuple.Tuple{keyword, username})
    tr.Set(key, []byte{})
}

Then reading is something like:

func (tr fdb.Transaction) ReadKeywords(keywordPrefix string) ([]string, err) {
   keywordBytes := userSubspace.Pack(tuple.Tuple{keywordPrefix})
   rangeStart := keywordBytes[:len(keywordBytes) - 1]
   rangeEnd := fdb.Strinc(rangeStart)
   fdbRange := fdb.KeyRange{Begin: rangeStart, End: rangeEnd}
   // then read the range (as discussed below)
}

For the pagination part, I think you could do something like:

func (tr fdb.ReadTransaction) ReadPaginatedRange(fdbRange fdb.Range, continuation fdb.KeyConvertible, limit int) ([]fdb.KeyValue, fdb.KeyConvertible, error) {
    options := fdb.RangeOptions(Limit: limit)
    var effectiveRange fdb.Range
    if continuation == nil {
        effectiveRange = fdbRange
    } else {
        _, end := fdbRange.FDBRangeKeySelectors()
        effectiveRange = fdb.SelectorRange{Begin: fdb.FirstGreaterThan(continuation), End: end}
    }

    rr := tr.getRange(effectiveRange, options)
    kvs, err := rr.GetSliceWithError()
    if err != nil {
        return nil, nil, err
    }

    if len(kvs) < limit {
        // Did not return all results, so no more data. Return nil as the continuation.
        return kvs, nil, nil
    } else {
        // Hit limit. Return the last key in the slice as the last read key.
        return kvs, kvs[len(kvs) - 1].Key, nil
    }
}

My go is a little rusty, but I think this is essentially what you want. Modulo any compiliation errors (disclaimer–I haven’t tried to compile this), the idea here is that it will read some number of keys and then return a 3-tuple containing: the data, a “continuation” key, and an error. The error just propagates any error encountered from FDB; the continuation is then used to resume the get range, with a nil continuation passed into the function meaning “resume from the beginning” and a nil continuation coming out of the function meaning, “there is no more data”. So then you would call this with something like:

func (db fdb.Database) ReadPaginatedRange(range fdb.Range, limit int) ([]fdb.KeyValue, error) {
    var continuation fdb.KeyConvertible = nil
    var results []fdb.KeyValue = []
    var done bool = false
    for !done {
        _, err := db.ReadTransact(func(tr fdb.ReadTransaction) (interface{}, error) {
            kvs, continuationInner, eInner := tr.ReadPaginatedRange(range, limit, continuation)
            if eInner != nil {
                 return nil, eInner
            }
            continuation = continuationInner
            results = append(results, kvs...) // there's probably a better way to do this that doesn't have O(n^2) runtime
            return nil, nil
        })
        done = continuation == nil
    }
    return results
}

(Again, sorry, my go is a little rusty.) But the idea is that a different transaction is used for each loop; retriable errors encountered while paginating (like transient network failures or an fdb recovery or a transaction hitting the 5 second limit) are retried by the ReadTransact call, but successful reads advance the continuation until the results slice has all of the results from all of the transactions. One still has to be at least a little careful to make sure that one doesn’t read too much data from FDB and OOM one’s process, but alas.

Something like that.

Hi @alloc, thank you for the response. Can you check the edit I have made to the question. I am confused about the continuation key you are mentioning. That is were I am stuck.

@alexmiller can you help me?

Hi @ravilution, I believe your confusion is regarding how to get the next key to start your follow-up paginated request; such a key can be formed by appending a single byte - 0x00 to the last key (byte[]) obtained in the previous batch.

Probably something like this? I have implemented a helper method in Java to do the same (you can do something like this in Go):

     * Returns the immediate next key that would sort after the given key.
     * Just appends 0x00 to the passed byte[]
     */
    public static byte[] keyAfter(final byte[] key) {
        final byte[] n = new byte[key.length + 1];
        System.arraycopy(key, 0, n, 0, key.length);
        n[key.length] = 0x0;
        return n;
    }

@gaurav I am unable to explain my confusion in the best way. Let me try again. Image I have below data in FDB

\x15\x12\x02search\x00\x02user\x00\x02trump\x00\x02realdonaldtrump1\x00 '' 
\x15\x12\x02search\x00\x02user\x00\x02trump\x00\x02realdonaldtrump2\x00 '' 
\x15\x12\x02search\x00\x02user\x00\x02trump\x00\x02realdonaldtrump3\x00 ''

When the user searches with the keyword "trump" I will form a key like this \x15\x12\x02search\x00\x02user\x00\x02trump. For the sake of this discussion, imagine LIMIT is 1.

The last key of the first page result will be x15\x12\x02search\x00\x02user\x00\x02trump\x00\x02realdonaldtrump1\x00.

Now what is should be my key for next page result to get keys and values for x15\x12\x02search\x00\x02user\x00\x02trump\x00\x02realdonaldtrump2\x00.

  1. If I use the last key completely (x15\x12\x02search\x00\x02user\x00\x02trump\x00\x02realdonaldtrump1\x00), I will get no result.

  2. If I trim it to x15\x12\x02search\x00\x02user\x00\x02trump, I will get the same old first page as a result.

If you use the above key as the start key of the next request, you should get back all the three rows. Why are you suggesting that using this as start-key will result in no result?

If you use the code snippet I posted, you would get back all rows AFTER the last row you got in the previous page.

@gaurav Thank you for the quick response. No disrespect to you. Trying to understand the fundamentals. Below is what I get when I run in fdbcli

fdb> getrange \x15\x12\x02search\x00\x02user\x00\x02trump\x00\x02realdonaldtrump2\x00

Range limited to 25 keys
`\x15\x12\x02search\x00\x02user\x00\x02trump\x00\x02realdonaldtrump2\x00' is `'

Please try the full syntax for getrange Usage: getrange <BEGINKEY> [ENDKEY] [LIMIT]. I am not sure what is the semantics for using this command with only begin key:

fdb> getrange \x15\x12\x02search\x00\x02user\x00\x02trump\x00\x02realdonaldtrump1\x00 \xFF 3

Range limited to 3 keys
`\x15\x12\x02search\x00\x02user\x00\x02trump\x00\x02realdonaldtrump1\x00' is `'1''
`\x15\x12\x02search\x00\x02user\x00\x02trump\x00\x02realdonaldtrump2\x00' is `'2''
`\x15\x12\x02search\x00\x02user\x00\x02trump\x00\x02realdonaldtrump3\x00' is `'3''
fdb> getrange \x15\x12\x02search\x00\x02user\x00\x02trump\x00\x02realdonaldtrump2\x00 \xFF 2

Range limited to 2 keys
`\x15\x12\x02search\x00\x02user\x00\x02trump\x00\x02realdonaldtrump2\x00' is `'2''
`\x15\x12\x02search\x00\x02user\x00\x02trump\x00\x02realdonaldtrump3\x00' is `'3''

With extra \x00 appended (keyAfter)

fdb> getrange \x15\x12\x02search\x00\x02user\x00\x02trump\x00\x02realdonaldtrump1\x00\x00 \xFF 3

Range limited to 3 keys
`\x15\x12\x02search\x00\x02user\x00\x02trump\x00\x02realdonaldtrump2\x00' is `'2''
`\x15\x12\x02search\x00\x02user\x00\x02trump\x00\x02realdonaldtrump3\x00' is `'3''
`\x15\x12\x14\x15\x03\x15\x01\x15\x01\x1a\x01h\xb7\x18\xfa\xe6\x15~\x1a\x01h\xb7\x18\xfa\x82' is `'

fdb> help getrange
getrange [ENDKEY] [LIMIT]

Fetch key/value pairs in a range of keys.

Displays up to LIMIT keys and values for keys between BEGINKEY (inclusive) and
ENDKEY (exclusive). If ENDKEY is omitted, then the range will include all keys
starting with BEGINKEY. LIMIT defaults to 25 if omitted.

For information on escaping keys, type `help escaping’.

Apparently, when you omit ENDKEY , then getrange is looking for all keys starting with the BEGINKEY.

getrange \x15\x12\x02search\x00\x02user\x00\x02trump\x00\x02realdonaldtrump1\x00 \x15\x12\x02search\x00\x02user\x00\x02trump\xff 1

The above command works. Great! However, what are your thoughts on FDB Known Limitation

Anything specific to it that you want to inquire?

I think this limitations is roughly saying that offset will need time/disk-access linear to size of offset used. Since, it will need to skip offset number of keys after finding the pivot key.

If you can describe the usecase you have in mind to use large offsets, then someone here should be able to suggest an optimal approach.

Just to make sure - this offset limitation is unrelated to “searching” the BEGIN and END keys in getrange operations (for operations you described above).

This works well as he described, but another option if you are using key selectors in your range query is to use the last key of your previous result in a FirstGreaterThan key selector, as Alec did in this line:

effectiveRange = fdb.SelectorRange{Begin: fdb.FirstGreaterThan(continuation), End: end}

@gaurav what is an offset? In my case, I will potentially have millions of records where I search users by keyword. In our above example, it is possible I have thousands of usernames tagged to trump keyword. My front end will only make a request for 15 to 20 records at a time. Rest as the user scrolls in the front end. So, imagine I am on the last page of the result whose total is in thousands. Does that make the offset big and slow?

No, it does not make your offsets big. Since you are always giving a precise key (or keyprefix) to start the retrieval (both at the time of first search, as well as in the continuation pages), you will not have large offsets.

Offsets are used to jump forward/backward n positions from a given search key.
I have not come across a good use case myself for using large offsets - probably one usecase could be to directly get the nth person on the leader board after/before a given person…

@alloc Your response/explanation was super helpful, thank you for writing it! Do you mind explaining why we need to leave out the last byte from keywordBytes?

   keywordBytes := userSubspace.Pack(tuple.Tuple{keywordPrefix})
   rangeStart := keywordBytes[:len(keywordBytes) - 1]

Sure thing! The way strings are packed in the Tuple layer is as:

\x02 + null-escaped UTF-8 byte string + \x00

(See: https://github.com/apple/foundationdb/blob/main/design/tuple.md#unicode-string). This encoding means that if you have a bunch of keys all formed by userSubspace.Pack(tuple.Tuple{keyword}), then the keys within userSubspace are all sorted by keyword. So, for example, if your prefix is something like \x15\x01 and your words are “fox”, “hare”, “herd”, “herder”, “hermit”, and “giraffe”, then keys will look like:

\x15\x01\x02fox\x00
\x15\x01\x02hare\x00
\x15\x01\x02herd\x00
\x15\x01\x02herder\x00
\x15\x01\x02hermit\x00
\x15\x01\x02giraffe\x00

In general, to scan for a byte range that’s entirely prefixed by a given byte string, you can do that by using the byte string as the beginning of the range and then using the “incremented” byte string (that is, the original byte string with the last byte incremented by one as the end of the range) as the end of the range. (All ranges are inclusive of the first key and exclusive of the last key.) You can validate that’s the case by considering any other byte string that is prefixed by the target string. That string will always be lexicographically greater than or equal to the prefix, but it will always be lexicographically less than the “incremented” string. Then you can validate that all other strings will either be less than the original string or greater than the incremented string.

For the specific case of looking for a prefix string from keywords, what you’re effectively trying to capture is all byte strings prefixed by the user-subspace prefix plus the \x02 type code + the prefix string. So, for example, if you want all strings beginning with “he”, then you need so search for all FDB byte strings prefixed by \x15\x01\x02he. Note that assuming everything is tuple packed, it actually would be okay to use \x15\x01\x02he\x00 as the range start, and the results should always be the same (because the only key that is in one range but not the other is \x15\x01\x02\he, and that key shouldn’t be in the database as it is not a valid tuple), as long as the correct ending string (the “incremented” version of the string without the last byte) is used.