Accessing key metadata for manual sharding

tabeth · October 14, 2022, 9:22pm

Hi all.

I was wondering - is it possible to access some of the internals around where underlying keys are stored?

For example, if you’re in redundancy mode double, data will be replicated to two machines. So you have machines, machine1, machine2 and machine3 and two of those three have the same keys. This is fair enough.

What I would like to do is run an application as a side-car along each machine. For example this could be something like Postgres. So we’d have [machine1, postgres1], [machine2, postgres2], [machine3, postgres3].

With this side car I’d like to know which keys in FoundationDB are stored on which machines. That way I could run a service to replicate those keys on each sidecar application or take some action in accordance to specifically where they’re stored.

I looked at the python API and didn’t see an obvious way to do this. Ideally there would be an API I could call for a given key that would give me metadata about the key, such as which machines in the cluster the key is stored on.

There seems to be some details about how one might do this here. It’s not clear to me from the document how to map a particular serverId to a physical host/storage server.

johscheuer · October 17, 2022, 7:11am

I’m pretty confident that no language binding has a method implemented for this. If you really want to get the information of which keys are stored in which server s you have to query the special keyspace: foundationdb/SystemData.cpp at main · apple/foundationdb · GitHub the result value must be decoded: foundationdb/SystemData.cpp at main · apple/foundationdb · GitHub the server ID will reflect the role ID in the status json. A few words of caution here: this mapping is dynamic e.g. if you add/remove storage servers or if you write more data into FDB this could affect the current mapping. There is also no guarantees that the format might change in the future (since the consumption is mostly meant for FDB “internal” processes).

I’m not sure what’s your use case with the sidecar but I would be cautious using this approach. Is there any requirement that your sidecar knows about the exact location where data is stored?

rajivr · October 17, 2022, 1:03pm

That’s correct! However, there seems to be some support for system data in Record Layer here. I recently introduced something similar in Tokio Rust binding.

ajbeamon · October 17, 2022, 8:48pm

I may be misunderstanding the question, but the locality API can be used to determine the current shard boundaries and the set of processes storing each shard (this can be determined by getting the addresses for one key in each shard, such as its start key). I’m not sure there is an easy way to be notified when these shards change or get moved, so you would have to poll this information periodically to keep it up to date.

Topic		Replies	Views
Troubleshooting hot keys/prefixes (or, viewing the shard->storage process mapping) Using FoundationDB	4	182	April 17, 2024
Iterating over FoundationDB Using FoundationDB	1	320	August 27, 2023
How do I scan through the whole foundationDB to find all the keys in it? Development	1	346	November 4, 2022
Can I force key prefixes into different shards? Using FoundationDB	1	211	April 18, 2024
Where is metadata about key and table distribution stored? Using FoundationDB	4	980	December 9, 2018

Accessing key metadata for manual sharding

Related topics