Proxy layer for securing the cluster

Hello

We plan to use FDB with a large number of users and security is a key concern for the entire ecosystem. This includes authentication, authorization, restricting certain range operations (eg., clear) to prevent both intentional and unintentional corruption of the data.

We are looking into different ways to implement this. On way is to natively enhance the cluster to build an auth layer - for example the write proxy can intercept all write ops and validate the client and subspace/key range before executing the txn. But this seems to against the general philosophy of FDB.

Second option is to have a layer which will interface with the application clients and apply the security considerations before sending the queries to the server. This more sounds like the right thing to do but write a layer has its own challenges.

  1. What will be performance impact in introducing another layer between client and cluster ?
  2. The protocol between the clients and auth layer should be carefully designed to make sure the proxy layer stays truly stateless and the client stays as close to the cluster as possible. ( What I really mean is i dont want to invent another complex protocol between client and the layer and then the layer does protocol conversion making it computation heavy and complex for maintaining the codebase).

Some notes about the actual application/clients -

The existing system we are trying to replace is a blob store, so we intend to use FDB as a native binary store. Client relies heavily on indexes and the server process has no need to understand the blob.

Tens of thousands of clients are common for a given cluster ( we are aware of the current limitation of max concurrent clients for a given cluster).

Question -

Our usecase seems to be a standard one and its possible lot of other folks have solved this problem. If there is already a solution for this then its ideal if not then any ideas/feedback is welcome.

It seems what you want is to build multi-tenancy. This is in itself a really hard problem.

We (Snowflake) do this by not giving access to FDB to any machine or service that we don’t control ourself. But if I understand correctly, this is not what you want/need.

Having a secure proxy looks like the best solution to me. Having this built in FDB would be super hard for several reasons.

It will be one additional network roundtrip. I don’t think this would be too bad in general. I think the biggest drawback is that you will need to worry about another service (load balancing, replacing hosts, scaling etc) - but if you already have infrastructure for these kind of things it should be relatively straight forward as the proxies would be stateless.

If you want to make this simply, I would actually just implement a thin layer around RYWTransaction. The 5 second rule in FDB makes this super simple: your state would be some map of the form ID -> RYWTransaction Object that you store in memory in the proxy. If a client dies you don’t really need to do anything - you can simply delete all transaction objects that are older than 5 seconds. Then you could use something like gRPC to implement the protocol. I think this should be cheap enough so that one proxy would be able to handle many concurrent transactions.

Would this be done through the C API, or are you recommending accessing RYWTransactions directly?

ah good point - RYWTransaction is just a transaction in the C API…

In general I would think it would be better to use the C API. And this could be implemented in any language we have bindings for.

Personally I would probably implement this in flow - mostly because I think it would be much easier to test (and flow makes async programming much easier). The document layer already does something similar.

So I think this boils down to taste.

The idea of a generic access control gateway for FDB has been discussed before. Ryan Worl released his version of doing so Introducing the Redis protocol FDB Gateway. This also culminated in a discussion of how to push this down into FDB itself.

During a previous conversation about restricting what clients can access, AJ brought up a very good point that the directory layer is written such that it is somewhat difficult to ACL by prefix. It uses \xfe to hold a map of directory name to shortened prefix, and thus all clients need to be able to see all of it to properly resolve their directories. Some \xff keys are needed for bootstrapping as well. Potentially leaving \xff and \xfe as read-only by default (but excluding \xff\x02 to all) would work, but it’s more than a trivial problem.

In the current state of things, I think pushing ACLing into a layer that you control is the correct approach. You can also run this as a layer process that is co-located with clients, so that the overall latency is roughly the same as if the client was directly issuing the operations. FDB 6.2 should contain some improvements to client scalability as well.

Thanks for the reply.

This is certainly one the problems we need to sort out. We arent using container based deployment for FDB yet, so all of these issues need to be sorted out. I havent explored much on tcp load balancers yet but I am sure there is something out there.

The real difference I see that FDB is designed for developing high performance services on top of foundation db but we are trying to use FDB as a service. We have thousands of desktop based clients and server processes (short and long lived) which operate on blob contents. This means I dont really need to build a layer on top of FDB.

These processes are very chatty and network hungry - this means adding a layer only adds more complexity. This is why i want the auth layer to be as close to the cluster as possible.

In addition to authorization, the other things I’d be concerned about with your proposed configuration are coordinator changes and cluster upgrades. When one of the coordinators change, the safe thing is to push out a new cluster file with the changed coordinator to all clients. If by “desktop client”, you are referring to “personal computers controlled by individuals”, you will somehow need to communicate that change to each software owner. (Perhaps some MDM solutions make this easy.) Additionally, certain FDB upgrades (in particular, those that change the minor version, e.g., 6.0 to 6.1) require clients to download new client libraries to talk to the new server version (and/or do some more config if they want to stay up during the upgrade).

I think experience has shown that that can be made manageable if you’re running a service where you can control all of the clients (and make sure they all have the right config), but it doesn’t really scale to many desktop users doing their own thing. I think an auth/proxy layer in front of the cluster (that also has, say, a stable network protocol and can sit behind a stable URL or VIP) probably is the simplest thing for y’all from an operational standpoint, believe it or not. You could also possibly go with some kind of hybrid where the other “server” processes talk directly to FDB and the desktop clients go through the proxy (if you work out the security implications there), though that depends on particulars.

Yes, its exactly what i mean. These are fat desktop clients sitting in individual user’s PC. But we control the software that runs in their PC. The current plan is to have a discovery service which will give out the latest coordinates of a given database and the clients will use that service to find the latest cluster details.

Is there anything special you do for securing/restricting admin operations via fdbcli ? For example, in extreme cases can a devops person connect to a cluster via fdbcli and run clear command ?