Schema evolution with multiple services

It seems like the record layer currently expects all nodes to upgrade or downgrade at the same time, and even with the usage of FDBMetaDataStore, there is no easy way to revert an upgrade (or otherwise run compatible but different versions of the proto definition concurrently, assuming they are compatible).

The ideal situation is as follow:

  • A central library that compiles and publishes the java proto artifacts as well as configuration of indices/primary keys
  • Services that depend on said library but deploys in their own cadence
  • Assuming that metadata versions only moves forward, if a service has an out-of-date version of the proto library, indices would be honored (with the latest metadata) and records could be written at with an older proto definition (fields could be missing for instance)

The above doesn’t seem to be possible even if we read RecordMetaData from the store at each start on the transaction (let’s assume what’s stored on disk is metadata with a higher version and the code is baked with a proto/index configuration from a previous time):

    RecordMetaData recordMetaData = getLocalRecordMetaData(); // what's baked into the service (from the proto library)
    
    // builder for every transaction.
    FDBRecordStore.Builder builder = FDBRecordStore.newBuilder().
        setMetaDataProvider(recordMetaData).
        setKeySpacePath(keySpacePath);

    RecordStoreProvider recordStoreProvider = cx -> {
      FDBMetaDataStore mds = new FDBMetaDataStore(cx, metadataKeySpacePath);
      mds.setLocalFileDescriptor(recordMetaData.getRecordsDescriptor());
      return builder.copyBuilder().
          setMetaDataStore(mds). // sets context
          createOrOpen();
    };

The issue is that setLocalFileDescriptor causes loadFromProto in RecordMetaDataBuilder to call evolutionValidator.validateUnion() which checks that the evolution is valid from what’s currently stored inside FDB (via FDBMetaDataStore) to the “new” descriptor (which is what’s baked into the service). What’s odd is that the “new” descriptor in this case is actually old (a service can be behind in terms of deployment cadence, or simply behind w.r.t. its peers) and this will fail if the proto has evolved to add a new field for instance.

It seems to me that the only way to fix this is to not allow multiple services to have different local versions of RecordMetaData (assuming that one does need to manipulate messages with local proto classes) or effectively put all services into maintenance mode while metadata upgrades are done. This, however, still puts into question how one could roll-back a release without it complaining that the metadata version is newer but the code is behind and hence fields could be missing.

Or is there a whole different way to handle this situation?

Seems like this is captured in: https://github.com/FoundationDB/fdb-record-layer/issues/965 and is hence still a WIP to support multi-clients, multi-version deployment strategies. Seems like work on that particular section of the code has stalled though (is help wanted?). The point raised in https://github.com/FoundationDB/fdb-record-layer/issues/843 is also not fully addressed even if FDBMetadataStore is used (adding a new field for instance would not be downgradable).

I can only guess how Apple is using this internally but it would seem to me that metadata is managed completely outside of the record layer and is upgraded separately. For instance if multiple services require access to the store, one would need to know that all in-the-wild services have upgraded to a particular baked proto (local file descriptors can be ahead but not behind) and then there’s a central system that bumps the stored metadata version upwards (which also bumps the MetaDataCache).

I could imagine that the code could allow the local file descriptor to be behind (or at least allow fields to be added in the stored RecordMetaData) and this could fix the issue but the unit tests seems to be explicitly asserting this to fail so perhaps the original devs can comment on whether that’s really safe to do.

The issue you’ve pointed to is, in fact, all about improving the meta-data story. I’d say that yes, help is wanted there, though it would probably good to coordinate any action in that area before significant work is undertaken to make sure that the new work aligns with the project’s goals and won’t cause problems with some of the other work that’s currently being done in and around that area.

The local file descriptor stuff is all a part of that, though that’s not strictly speaking about upgrades. The idea of the local file descriptor is that you might want to use the FDBMetaDataStore to store your meta-data, but if you did so, you would get different file descriptor objects than the ones you’d expect because of how proto descriptors are constructed (i.e., pointer equality between descriptors would be lost), which violates part of the contract we had internally. So, the local file descriptor was really more of a pre-requisite for using the FDBMetaDataStore with anything other than dynamic messages. And yes, the idea is that you explicitly might want to have that file be ahead but not behind. The reasoning is that that won’t always be safe (for example, if a field is added and then an index is added on that field, the indexing might not happen correctly; even if it’s not indexed, there are situations where message deserialization and deserialization can drop the fields, e.g., an “increment field on record” method could drop unknown fields, and so we want to make sure that anyone reading the data has a complete view of what the data actually are).

The approach to schema evolution that we’re sort of trying to suggest is that either:

  1. A service may choose to manage the meta-data in code, but then it needs to be able to guarantee that when it upgrades, everything that is using the meta-data can see the newest version of the meta-data before the meta-data is actually upgraded. This implies something like being able to dynamically generate the meta-data at different versions, and then swapping in the right version. You can do this today with the right RecordMetaDataProvider assuming that you also have the right hooks to start using the new meta-data when the time arises. For example, you could create a RecordMetaDataProvider that is also a UserVersionChecker and then control the meta-data version based on the user-version that is read. But it’s a bit delicate and not currently implemented in the Record Layer directly. As you point out, this doesn’t really scale to multiple services, though, unless they all can share the same logic and meta-data generation, and it also couples your schema evolution with your release cycle (i.e., a release that updates the meta-data cannot be rolled back).
  2. The service manages its meta-data in some kind of centralized store, e.g., in the FDBMetaDataStore or an external database. I believe this works today fairly reliably assuming that you read the meta-data from the store each time, but the problem there is that we expect that for large users, that reading the meta-data each time will cause hot shards (or the equivalent thereof on whatever external database they use), and so for performance reasons, some level of caching is necessary. If so inclined, logic can be written around the current abstractions to do something like catch the stale meta-data exception and then flush the cache. The proposal for future improvements laid out in that issue is to make cache invalidation more natural by, for example, including the expected meta-data version in the “get meta-data from cache” request to know if the cached value is stale.

Both of those options have a fair amount of fiddly bits that need to live in application rather than Record Layer code (hence the issue to improve it), but it is doable.

1 Like

Thanks @alloc, with the approaches given, I am assuming that using dynamic messages is a requirement still? If we use FDBMetaDataStore and read meta-data each time it seems to me that it is still going to throw if I supply a local file descriptor and the metadata in FDBMDS is ahead (“field removed”). Since you mentioned that this method should work today fairly reliably, I am trying to understand whether that statement implies using dynamic messages for all interactions instead (meaning do not use a local file descriptor and handle serialization/deserialization manually).

In the ideal world, protobufs allow co-existence of code at different version, ahead or behind (understanding that if you read a new byte[] representation of a message, one might lose fields). I am curious to know how the system “might” break if the requirement that fields can be dropped is relaxed. Specifically, you mentioned that the indexing might not happen correctly, assuming that the rest of the metadata is valid, would that just look like a field is now set to null if we read and then write a record that’s originally written with a definition that’s newer than the current code’s definition? If so, one might be able to live with that as long as one understand the implications. Essentially data stored in new fields might be lost but indices are still maintained properly. Competing writes might happen and a no-op write could cause an index to be added and removed rapidly as expected between two versions.

The best case scenario for folks just starting to use the library (myself included) would be that one can have concurrent upgrade/downgrades, multiple in-the-wild proto versions, proto baked into code or a dependency, i.e. not requiring an external metadata management system, etc. with the caveat that once you deploy code that increments the metadata, it never goes backwards, and that fields can be lost when one read and then write the same record with an older protocol definition. If one desires stricter limitations to that, perhaps they can set MetadataEvolutionValidator to a stricter version. There also should be a way to deprecate fields permanently too (for that matter).