RPC Layer Requirements and Design

Hello all,

We’ve published an initial draft of the requirements document for the RPC layer on the project’s GitHub wiki: https://github.com/apple/foundationdb/wiki/FoundationDB-RPC-Layer-Requirements. The intent of this layer is to provide a straightforward and stable protocol for clients that makes it easier to write pure language bindings (i.e. not dependent on our client library) and simplifies the story around versioning and upgrades.

If this is a topic that interests you, please take a look and let us know what you think! Feel free to provide any feedback directly in this forum thread or a new one, as appropriate.

5 Likes

This is great!!! We really need such a layer for increasing #clients and collect devops metrics.

There is no explicit mention on how we will do “load balancing” across processes of this layer.

Since the draft already mentions that each independent process will provide it’s metrics in some form (push OR pull), one can route clients to less loaded processes (only) during discovery phase. In this way, there is no need to have explicit load balancing logic in the client driver.
Is this understanding correct?

Thats a good question. Attaching clients to a single server process could be one way to enforce that transaction sessions are maintained with the same RPC server. I think we’re trying to leave the discovery aspect somewhat open ended, such that identifying the RPC endpoints could be done through existing mechanisms that people operate or by connecting to the cluster using the cluster file, as we do now. We’d need to consider whether this makes sense in that context.

If we go the route of implementing load balancing in the client binding, one option could be to persist a connection to a random server process and then move if it fails or signals to you that it needs you to move. Or perhaps each transaction could choose an arbitrary server process from all or a subset of the available ones.

Any updates on this topic?

We decided not to target this feature for the 6.3 release, but it is still in our roadmap and work should be resuming on it soon.

Hi,

The GRPC layer looks like a really nice way to provide a simpler interface with the cluster for several reasons. The key ones that leap out at me are the obvious ones I think: the reduction in coupling between the layer above FoundationDB and the FoundationDB version; the removal of the need for the layer above to have access to the cluster file; easier visibility into client activity.

I have a couple of comments:

Because a single RPC instance may be handling transactions from multiple clients, it is possible that one of the RPC instances could be tricked into revealing data from one transaction to an unrelated client. Because access to the RPC layer instances grants access to the entire cluster, this may not be a significant issue.

There is an exposure problem here for layers that are performing actions on behalf of multiple tenants. While that layer would be performing the interaction with FoundationDB and so have validated client input to ensure “sensible” transactions, that layer does necessarily make the assumption that FoundationDB is hardened inasmuch as possible against the possibility of revealing inappropriate data.

Mostly noting a way that I think this may be a significant issue :slight_smile: (Perhaps I’m missing a nuance here that means this isn’t relevant!)

The RPC instances must also provide a mechanism for secure access from allowed clients, preferably including through use of mutual TLS. It may be desirable that this mechanism is configurable separately from that of the rest of the cluster, which would allow an operator to have independent restrictions to the RPC and non-RPC cluster interfaces.

I wanted to +1 this as it would be really nice to have an extra layer that restricts client access to the RPC layer (the obvious other layer being iptables or a reverse proxy in front).

1 Like

Is it possible to use something more “PR friendly” as a language for this layer?

For example, could we implement some simple additional features that sometimes doesn’t make sense to move to application side like “count” implementation? I could contribute this if it is implemented in more widespread language.

One option for how we implement the RPC layer (probably the favored one right now) would be to have RPC processes be part of the cluster like various other roles (transaction log, proxy, etc.). If we go this route, then these processes would need to be implemented in our C+±based Flow language.

If they are external processes (more akin to the backup and DR agents), then there is some potential flexibility. However, building them in something other than Flow has some trade-offs, in particular with our ability to simulate the process in our simulated testing, so we’d have to weigh those when deciding.