Versioning of special key space

The example above all would need querying across modules as the subranges will need different implementations.

Otherwise you need to redefine what module means and make it separate from the implementation. But this is super hard to do in a intuitive way. Is all of status one module? Or should status/cluster be a different module than status/client? I can make arguments for and against both approaches (and performance wise status/client can be implement much cheaper than status/cluster - and the plan is to not use status json to implement those but instead make use of queries that only want to know a part of that).

To reiterate again, these comments are coming from a position of not knowing the current design but rather from trying to imagine how things could work. In particular, I donā€™t know if the way Iā€™m using the word module matches how itā€™s been conceived previously.

I think that in your example, status could be a ā€œmoduleā€ that would allow arbitrary range querying of any subcomponents. Transactions stats would be a ā€œmoduleā€ that allowed arbitrary range querying of any subcomponents. A very simple implementation of this idea would be to not allow you to do range queries across top-level modules, which are likely to be unrelated.

A potentially more advanced approach would be for modules to allow or restrict arbitrary queries within them. That sounds like it may get complex both in terms of implementation and use, so Iā€™m not wholly convinced that would be a good idea.

I think you misunderstood my point above. I donā€™t think you can define module properly (and status json is one example where this becomes problematic).

For example if we would flatten status json into a list of key value pair and we would implement this in an optimized way, getRange("\xff\xff\status/client/coordinators/", "\xff\xff\status/client/coordinators/") would be a very cheap query. However, getRange("\xff\xff/status/, "\xff\xff/status0") would be very expensive.

Do you want to now split this range to protect a user against a scenario where they would accidentally issue a more expensive query than intended? Do you now want to partition these modules by how you think they logically relate to each other? Do you hope that all of our users have the same intuition?

I think going down this path is troubling. Going down this path for one key (everything else in there is cheaper or equally expensive to a key that is in the normal keyspace) seems a bit of an overreaction.

No, I donā€™t think we need to protect against every case for the protection to be useful. My main argument has been that if we donā€™t think thereā€™s an argument to query across modules (and not necessarily all modules, but say top-level ones), then itā€™s risky and I think we should disallow it to alert users that they are doing something unintended.

I think this is somewhat defensible based on the fact that these operations are not reading keys but invoking arbitrary operations (which could of course actually just be reads). Also, the fact that we the developers might be inserting new modules into this space makes it different than regular keyspace, where we arenā€™t going to make someoneā€™s range return more data after upgrading (system keyspace excepted, as we provide few guarantees there). I think having one operation trigger unrelated actions has some downsides, but Iā€™m not necessarily wedded to the idea that we have to disallow it if the benefits are greater.

Mainly what Iā€™m trying to ascertain is whether allowing it is more useful than disallowing it would be, or if disallowing it would be more harmful than allowing it. Note also that Iā€™m thinking more theoretically, not necessarily in reference to any particular implementation. Another good argument against disallowing these reads would be that there isnā€™t an easy or natural implementation of this.

Iā€™m not sure it would be unfriendly to the user to report errors for such reads if we otherwise thought this was a good idea. In fact, my argument is that itā€™s in some ways more friendly if it protects them from doing bad things given they will find out sooner. That of course is dependent on whether we arenā€™t also preventing them from doing useful things by disallowing it.

Ok in that case we can agree to disagree. So far we have two proposals:

  1. Obey API_VERSION (only insert/remove ranges if the API VERSION changes) but allow a user to construct any valid range-query across this key space.
  2. Donā€™t version the key space but make sure that users can only query one subrange at a time.

I think (2) is of bad taste and lacks definition what module means in that context and you think that (1) is unsafe as it could cause a user to query expensive subranges that they donā€™t actually need.

(2) has the benefit that it will be much easier for us to implement, though one reason for this forum post was that Alec was super unhappy about this and voiced his concerns in this PR.

I would like to have other people to voice their opinion - especially from people who build layers.

Maybe another way to frame this that might focus the discussion is to ask it parts:

  1. If we could determine if it was obviously wrong to query across (some) modules, would we want to tell the user? To me, this is where I feel most strongly that the answer is yes, though obviously we could disagree. I think this even if there existed cases where the wrongness was less clear and we couldnā€™t tell the user.
  2. Is it obviously wrong to query across modules? Iā€™m not so certain about this one. My initial reaction is that a user should prefer to query some modules independently rather than scan them at the same time, but maybe not?
  3. Is it feasible to make the determination that itā€™s wrong to query across the modules? The top-level modules idea seems like a place where we could, but again Iā€™m not as certain about this one.

Strong disagree, I think we should definitely version the key space. This was not was I was arguing.

Can you give me an example of such a case? I think at least for debugging purposes it would always be okā€¦ Or we could have one client that queries everything every 10 minutes and write it into Snowflake.

Maybe there is a case where it is wrong 99.99% of the cases? I just canā€™t think of one.

It doesnā€™t seem you have the answer to that (yet). Maybe we could start by writing down potential stuff we could expose like this in the future to brainstorm?

I would also argue that if status json is the only thing that is problematic your concern is problem too cautious. If there are more of these examples where a module would be much more expensive than a simple get operation than there is a stronger argument for additional safety nets.

In that case I am completely confused. How would this look like?

By obviously wrong, I mainly meant that if we were telling someone what to do and not do, we would say ā€œdonā€™t range read across these modules, you should read them individuallyā€. In other words, is it always better to read some modules individually rather than assume some ordering property in the key space and read them together? Maybe by analogy, an application might disallow reading across separate directories just because they happen to be located next to each other because thatā€™s brittle and instead enforce that range reads strictly lie within a directory.

If thatā€™s something you or other people want to be able to do, then it would be a good use-case for making it easy to range read the whole thing (maybe assuming the restriction that reading this key-space doesnā€™t have any side-effects, which I imagine is true). I can see the appeal of not having to update the query logic for every new module.

Well, this was mainly based on the fact that our first (I think) example of a key that does this is an expensive one, which suggests that it might not be rare. It seems reasonable to consider what expensive operations might do if weā€™re building a general purpose framework to support arbitrary operations, but if we really thought that was an exception to the rule we could discount it.

I think that versioning and allowing range scans are partially orthogonal. Mainly I was trying to say that itā€™s not a requirement of API versioning that you disallow new features at old versions, but if you want or need to disallow some new feature, then itā€™s reasonable to do so.

FDBBind dev here. I may not have all the context, so apologies if this was hashed out elsewhere :slight_smile:

Iā€™m not sure that this is true, unless Iā€™m misunderstanding the contract of API versioning. My understanding is that the bindings/client/server should be able to be updated without requiring any changes from application as long as the requested API version is supported. I feel like allowing new special keys to be queried via range queries breaks this contract, since an existing query may now return a different result.

Yeah, this is why I somewhat carelessly described them as only partially orthogonal. There are two things here - one is that the specific questions I posed about whether to allow range scans are not really related to the API versioning question (i.e I would have said the same things if we hadnā€™t discussed versioning at all), so in that sense those two parts of what I discussed are mostly unrelated. The other is that the choice to not restrict old versions is I think ok if you disable range scans (at least across modules), as you wouldnā€™t break any queries on other modules. It is indeed true that things become dicey if you allow the scans and donā€™t hide new features from old versions, which makes them not wholly orthogonal.

At any rate I donā€™t have strong opinions about hiding data from old versions, I only intended to describe a possible relaxation that you could opt to take advantage of if feasible and desired, though it seems like neither of those conditions will hold.

Maybe another interesting question that youā€™ve possibly already hashed out is what kind of contract is required when reading these keys with respect to errors or timeliness? For example, is it possible to get an error reading one of these keys, and if so what happens if you scan over it in a larger range query?

For timeliness, Iā€™m imagining implementing something like the get versionstamp function in this API. Would you be allowed to return a future that doesnā€™t complete until the transaction commits, and if so what implications does that have for the range scan?

My assumption is that you wouldnā€™t allow a particular read to fail or block the range scan indefinitely. If so, how is that accomplished?

I guess thatā€™s fair enough. Though I would like to have a more or less formal definition of what the api version should do. It could be ā€œwe never change existing ranges without an api version change but we could add more rangesā€ or it could be ā€œwe never change the special key space without changing the api versionā€. I think it should not be ā€œevery module decides what to do with the api versionā€ (I donā€™t say that this is what your proposing).

For Errors:
For most errors I would imagine we can use the existing errors that get could throw. I canā€™t think of many use-cases where those wouldnā€™t be adequate. I also think we should either return the whole range the user asked for or only return an error as this would be consistent with the current behavior for the normal key space - if a range query goes to two storage teams and one of them returns while the other is down, we also donā€™t return half the result.

For Timeliness:
This is a bit more interesting. There are some ranges that only exist after a commit (versionstamp, commitversion, conflict set etc). For those I think we should just return empty set and my argument would be the same as above: it is consistent with the behavior of the normal keyspace (we donā€™t return futures to values that will exist eventually). If you want to wait on the version stamp we could think about supporting watches in the special key space.

1 Like

Hello

Gabi Kliot here, from Snowflake FDBBind team. I think we need to first agree what ā€œmoduleā€ is. We are suggesting to associate specific behavior with a key or a range of keys. Within that module there are specific semantics and the data is returned in some specific format. For certain modules the implementation just reads the underlying data from FDB, but for others it may read it from somewhere else or may return partial data or may return in a different format. That is - the implementation ā€œvirtualizesā€ the data behind those keys or key range. Its also a control API - by writing something I may be telling the module to do something.
I hope so far we are all in agreement since this is the whole point about this proposal (and specifically looking at the PR https://github.com/apple/foundationdb/pull/2662).
This is similar in spirit to UNIX /procfs https://en.wikipedia.org/wiki/Procfs.

I look at those as essentially functions that FDB implements. The fact that we expose those functions via key/value reads and not another RPC API is just to piggyback on the existing transport implementation. Just like unix piggybacks on VFS but really - /proc is not a file system, its kernel API interface.

If we agree with my view above, then I dont see a need to provide cross-modules reads. In the vast majority of cases, as a user of this, I would not want to ā€œbulk-invokeā€ unrelated functions from that interface. The only case I can think about is just for DR and replication. But really DR here is up to the module. Its like if I had an API Server and wanted to DR it by invoking all its APIs and storing the responses. If I want that, I would probably write a ā€œbroadcast across modulesā€ module and tell them all to do DR now.

As a user of that, I think I would be fine if you disallow cross-module interactions. If you still allow them - thatā€™s OK too, but I donā€™ see using it much. Also, you can start by disallowing and allow later. It would be much harder the other way. Also, if we do allow, now as a user of this I need to think which common API version I use for this cross module read. I rather prefer to think about each separately.

Stepping aside from cross-modules reads question, I have other questions. Who decides on which modules FDB supports and to which keys they are mapped? I suggest the following:

  1. everyone can develop a new module and associate it with any special key or key range within the special key space.
  2. as an admin of my FDB cluster, when I start the cluster I pick which modules to install and to which keys they are mapped. In theory it could be that Apple will have module X mapped to key 1 and Snowfalke will map the same module to key 2 and also have module Y which Apple does not have at all.
  3. we will of course have a predefined ā€œblessedā€ modules in FDB GH repo and a list of ā€œstandardā€ keys to which we recommend, by convention to map those (like status json). But in theory everyone can map them and have their own modules.

That way we avoid the need for global coordination in dividing the special key space and agreeing on modules. Kind of like I can pick my own patches in Linux and write my own device drivers.

I followed you until this point.

Are you suggesting a plugin interface for \xff\xff modules? I thought that this would purely just be a question of ā€œwho is running vanilla upstreamā€ and ā€œwho is running a custom buildā€. If youā€™ve done custom modifications, and built and are running your own FDB client/server, then you can do whatever you wish, and the maintenance of that is on you. If youā€™re running vanilla upstream, then you get exactly what vanilla upstream does.

Otherwise, wouldnā€™t there need to be global coordination of ā€œmodulesā€ to \xff\xff keys, so that anyone that isnā€™t an Apple or Snowflake employee knows how to write client code to access a module? Or are we providing an extra level of abstraction from module name ā†’ key range as a client API?

I see. I think we are mostly in agreement here. I in turn donā€™t have a strong preference about reading across modules. I think it makes more sense to allow it for the same reason that Markus has mentioned. Namely, that it keeps just one API across the whole key space. I would either expect no range reads to work, or arbitrary range reads to work. Having only some range reads work seems unintuitive, even if the ones that fail are not likely to be ones that I write. But again, I have less of a preference here :slight_smile:

To me, the big difference here is that there is no equivalent of a range read over files in UNIX (that I am aware of).

I think the name ā€œmoduleā€ might be confusing. In this context, I understand ā€œmoduleā€ to mean ā€œa group of related functions which we would like to have under the same prefix under \xff\xffā€. They will be fixed functions under a fixed prefix hard coded into the source. They are not meant to be configurable.

I think this is a good point and I think if we donā€™t find an agreement about this specific point we should start by disallowing it.

1 Like

I was thinking I pick which modules I decide to install. There are plenty, but maybe I dont want one of them.
The reason is: letā€™s say one module is some rich statistics module on usage tracking. Maybe it has some runtime cost associated with this tracking. Someone may want to disable. Of course we can have a special startup arg or option to say - dont collect those stats. Or like I propose here - if the stats collection module is not even installed - we donā€™t collect.

So yes - I am suggesting a plugin interface for \xff\xff modules.
I think it will actually avoid global coordination. When I install a module I say, in my manifest or cluster file - I want those modules mapped to those keys in the key range.

anyone that isnā€™t an Apple or Snowflake employee knows how to write client code to access a module

they will be just reading keys, since the syntactic API shape is just FDB KV. Of course, they need to understand the semantics of this data and this API. But they would not need to write any custom code.

Anyway, this is just one suggestion.
I was basically interested in what this post called a "module. From the post it sounded to me like this is a well known construct in FDB. But apparently we are still trying to decide what it means.

I think currently a module that is problematic could chose to be globally turned off or on depending on a configuration that we write into the \xff key space. For certain things (especially if we want to put configuration in there - or other more dangerous stuff) this makes a ton of sense to me.

But a full blown plugin system seems a bit excessive. Can you give me a few examples of what a user might want to have in there?

I think maybe the idea of implementing something like get_versionstamp in this API is one reason why you might consider disallowing reading across the modules as one possible design choice. If you read a range that included get_versionstamp and, say, status, in the same request, one of those is allowed to return before a transaction commits and one isnā€™t (at least as weā€™ve currently defined the operation). Disallowing doing both operations in the same request is one way to avoid the weirdness that could ensue.

An alternative design choice to solve this would be for range reads over these keys to be empty or return some other signal that they were unable to complete. Itā€™s also weird in its own way, but it has some nice properties too.

The same goes with errors. If one of these operations wanted to return an error, thereā€™s a straightforward way to do that if you are performing only that operation. If your query is performing multiple operations, then probably weā€™ll want some way to signal errors (or, again, just omit results) rather than failing the whole thing. I think the alternative of failing the whole range probably artificially limits what your operations are allowed to do, as youā€™d have to consider the effects of error conditions on people query the whole range.

Another interesting question I have about these is whether there are going to be operations that take parameters. As a contrived example, I could implement a read by calling get on something like \xff\xff/read_key/foo. If so, then a point read on this key would return something, but we probably wouldnā€™t want a range read to return anything.

I guess that is also a reasonable solution.
The example I was thinking is if we want some new type of throtteling or quota management inside FDB. But maybe Apple wants a very different quota management solution. This way ww could allow both.

In your proposal - how do we deice which special key are used, which maps to which module and how special keys are not mixed up?

But a full blown plugin system seems a bit excessive.

You doā€™nt have to support dynamic installation. Lets say only at startup via cluster file. So the only dynamism here is to which special key to map each module and if the module is enabled or not.