How to model a Leaderboard

Total novice here - exploring the possibilities. So please bear with me if I am asking something obvious.

I need a leaderboard with the following capabilities -

  1. Return top X ( usually 3) players with their corresponding scores
  2. For any given player, say Player A, return his/her ranking and score along with 2 players ahead and 2 players behind in ranking. Say, player P’s own ranking is 40, then I need 38th, 39th, 41st, 42nd players’ ranks and scores also.
  3. bonus - Return bottom most X players with their corresponding scores

Another aspect of it is - there will be multiple updates of a player leaderboard record as he/she keeps improving the score.

Can someone recommend a good or at least workable( low latency) approach in FDB?

Edit: Note, I need ranks of the players together with scores.


Do you want Python code?

Here is something to get you started in Python:

from operator import itemgetter

import fdb
from fdb.tuple import pack
from fdb.tuple import unpack


def get_score(tr, user):
    """Return the score of the user.

    Add the user to the leaderboard if they are not already there

    user_key = pack(('scores', user))
    score = tr.get(user_key)
    if score == None:
        # init the score at zero
        score = 0
        tr.set(user_key, pack((score,)))
        # Also store the user in the leaderboard.
        # Here the value is empty and the full data is encoded
        # in the key in order to make each key unique and keep
        # the leaderboard sorted.
        tr.set(pack(('leaderboard', score, user)), b'')
        score = unpack(score)[0]
    return score

def add(tr, user, increment=1):
    """Add INCREMENT to USER score.

    Return the new score of the user"""
    score = get_score(tr, user)
    total = score + increment
    # update the user's score...
    user_key = pack(('scores', user))
    tr.set(user_key, pack((total,)))
    # ...and the leaderboard
    tr.clear(pack(('leaderboard', score, user)))
    tr.set(pack(('leaderboard', total, user)), b'')
    return total

def top(tr, count=3):
    out = dict()
    # iterate over the whole 'leaderboard' namespace in reverse order
    iterator = tr.get_range_startswith(pack(('leaderboard',)), reverse=True)
    for key, _ in iterator:
        _, score, user = unpack(key)
        if score in out.keys():
        elif len(out.keys()) == count:
            out[score] = [user]

    # output a sorted dict (python >= 3.6)
    return dict(sorted(out.items(), key=itemgetter(0), reverse=True))

Here is an example REPL run:

In [1]: import fdb

In [2]: import leaderboard

In [3]: db =

In [4]: 

In [4]: leaderboard.add(db, 'amirouche', 42)
Out[4]: 42

In [5]:
Out[5]: {0: ['nitu'], 42: ['amirouche']}

In [6]: leaderboard.add(db, 'nitu', 42)
Out[6]: 42

In [7]:
Out[7]: {42: ['nitu', 'amirouche']}

In [8]: leaderboard.add(db, 'amirouche', 42)
Out[8]: 84

In [9]:
Out[9]: {42: ['nitu'], 84: ['amirouche']}

Feel free to ask question, there is no stupid question, especially at this point.

Also, I’ve been answering question about how to create applications ontop key-value stores on stackoverflow, see “Find by value on leveldb” and “Key Value Store in NoSQL Database Systems”.

BTW, before some ask the question, I don’t think it’s worth learning wiredtiger or leveldb or lmdb at this point (except maybe if you plan to do event sourcing like SkuVault).

edit: copy/paste failure

The above code doesn’t use Directory or Subspace I think the first step to improve that code is to use them.

This is definitely a good start. Any good idea how we can get a given player’s own rank without needing to scan through the entire records ? Suppose I am having 100K players- how to find a particular player’s rank at any given point of time ?


I think that the leaderboard subspace is somewhat small, you can fetch all of it. Say you have 100K players, the key is made of score + user_id each of which are unsigned 64 bits integers it’s less than 2 megabytes (100k * 64 * 2 / 1024) at 1000Mbs it 0.5 seconds to get the ranking of a random user…

Ok, maybe it requires another optimization. What you can do is create a rank subspace and re-compute the rank of each user everytime an user change their score. That will quiet slow down writes… Another approach would be to have another subspace where you recompute the ranks using a background job. Yet another approach if you hit the 5s transaction limit is to use yet another subspaces:

  • A adds: (timestamp) -> (increment, user) log that will store the add operations
  • A ranks: (rank, user) -> () subspace, that is updated by a background job based on add-events

Like I tried to say, above the simplest solution (that will not slow down all writes) to solve that requirement, is to have a ranks subspace that is updated by a background job.

Another solution, is to have specific service that stores the ranking of all users in memory or use REDIS. After all, the ranks is only 2Mb and it takes less than 1s to recompute.

Now that I think about it, with a specific service, one would not need for the leaderboard subspace and would speed up ‘writes’ while still being easy and fast to compute ranks and leaderboard.

It seems like there is no definitive answers here. It depends on the workload you have and the tradeoffs you are ready to do.

@amirouche , thanks. There are tradeoffs for sure. As you have mentioned REDIS has sorted set which is suitable for this purpose - that is what I have been using. However, I am seeing whether something similar I can pull off FDB for the same.

There’s another solution that I don’t think I’ve seen here that doesn’t require a background job or reading through the entire data set, but it does involve doing multiple reads from the database’s data structures. But then as soon as your data are inserted, you can start querying your index, which might be a desirable property.

The basic idea is to persist a skip list in FDB. Each level of the skip list is kept as a contiguous range, and the keys in each level are the indexed values (probably serialized with, say FDB tuples) and the values are integers encoding how many elements are under that level of the skip list. To determine the rank of any single item, you scan through one level of the skip list, adding up sizes until you get to a sign-post key that is greater than the key you are looking for. Then you go down one level, starting from your previous key and query additional elements. So, to query, this requires one range-read per level in your skip list. To update, you find the right element to update in each level, and add 1 to the value associated with each key in the level. To do splits, you need to recalculate the number of items under that level, but that’s not so bad. So it ends up being one read and one write per level in the skip list (with some of those reads being range reads).

I think that makes sense. It’s not the most straightforward data structure to implement (and there are optimizations that can be made to increase its tolerance contention that are somewhat subtle as well as additional optimizations to make it more likely to hit the client-side cache). I believe that there is a C# implementation that @KrzysFR wrote up (BSD licensed) of something along the lines of what is outlined above, though I don’t know how up-to-date that is kept.


Thanks Alec, It is a better in approach and seems to fit the requirements. I shall go through C# implementation you have pointed to.

This was actually a port of the python version that does not seem to be online anymore. I forgot how it works since then :sweat_smile:, but I belive @dave may help you there.

Just to point out that there is a newer and much better Java implementation used by the Record Layer here: