Accessing key metadata for manual sharding

Hi all.

I was wondering - is it possible to access some of the internals around where underlying keys are stored?

For example, if you’re in redundancy mode double, data will be replicated to two machines. So you have machines, machine1, machine2 and machine3 and two of those three have the same keys. This is fair enough.

What I would like to do is run an application as a side-car along each machine. For example this could be something like Postgres. So we’d have [machine1, postgres1], [machine2, postgres2], [machine3, postgres3].

With this side car I’d like to know which keys in FoundationDB are stored on which machines. That way I could run a service to replicate those keys on each sidecar application or take some action in accordance to specifically where they’re stored.

I looked at the python API and didn’t see an obvious way to do this. Ideally there would be an API I could call for a given key that would give me metadata about the key, such as which machines in the cluster the key is stored on.

There seems to be some details about how one might do this here. It’s not clear to me from the document how to map a particular serverId to a physical host/storage server.

I’m pretty confident that no language binding has a method implemented for this. If you really want to get the information of which keys are stored in which server s you have to query the special keyspace: foundationdb/SystemData.cpp at main · apple/foundationdb · GitHub the result value must be decoded: foundationdb/SystemData.cpp at main · apple/foundationdb · GitHub the server ID will reflect the role ID in the status json. A few words of caution here: this mapping is dynamic e.g. if you add/remove storage servers or if you write more data into FDB this could affect the current mapping. There is also no guarantees that the format might change in the future (since the consumption is mostly meant for FDB “internal” processes).

I’m not sure what’s your use case with the sidecar but I would be cautious using this approach. Is there any requirement that your sidecar knows about the exact location where data is stored?

That’s correct! However, there seems to be some support for system data in Record Layer here. I recently introduced something similar in Tokio Rust binding.

I may be misunderstanding the question, but the locality API can be used to determine the current shard boundaries and the set of processes storing each shard (this can be determined by getting the addresses for one key in each shard, such as its start key). I’m not sure there is an easy way to be notified when these shards change or get moved, so you would have to poll this information periodically to keep it up to date.

2 Likes