Forward Compatibility for Transaction Logs


(Alex Miller) #1

As part of working on issue #1048, modifications were made to the transaction log that changes its on-disk format. As a result of this, some additional work was done to implement a plan of how to allow for rolling back upgrades that modify transaction log on-disk formats. The below is the design doc that lays out this plan.

The implementation of the below was already merged as part of PR #1160.

We don’t currently have a design document section of the documentation, so I’m posting the design doc here until we do, at which point I’ll replace it with a link. But, for now, I suppose this makes quoting easier.


Forward Compatibility for Transaction Logs

Background

A repeated concern with adopting FoundationDB has been that upgrades are one way, with no supported rollback. If one were to upgrade a cluster running 6.0 to a 6.1, then there’s no way to roll back to 6.0 if the new version results in worse client application performance or unavailability. In the interest of increasing adoption, work has begun on supporting on-disk forward compatibility, which allows for upgrades to be rolled back.

The traditional way of allowing roll backs is to have one version, N, that introduces a feature, but is left as disabled. N+1 enables the feature, and then N+2 removes whatever was deprecated in N. However, FDB currently has a 6 month release cadence, and waiting 6 months to be able to use a new feature in production is unacceptably long. Thus, the goal is to have a way to be able to have a sane and user-friendly, rollback-supporting upgrade path, but still allow features to be used immediately if desired.

This document also carries two specific restrictions to the scope of what it covers:

  1. This document specifically is not a discussion of network protocol compatibility nor supporting rolling upgrades. Rolling upgrades of FDB are still discouraged, and minor versions are still protocol incompatible with each other.
  2. This only covers the proposed design of how forward compatibility for transaction logs will be handled, and not forward compatibility for FoundationDB as a whole. There are other parts of the system that durably store data, the coordinators and storage servers, that will not be discussed.

Overview

A new configuration option, log_version, will be introduced to allow a user to control which on-disk format the transaction logs are allowed to use. Not every release will affect the on-disk format of the transaction logs, so log_version is an opaque integer that is incremented by one whenever the
on-disk format of the transaction log is changed.

log_version is set by from fdbcli, with an invocation of fdbcli -C cluster.file --exec "configure log_version:=2". Note that := is used instead of =, to keep the convention in fdbcli that configuration options that users aren’t expected to need (or wish) to modify are set with :=.

Right now, FDB releases and log_version values are as follows:

Release Log Version
pre-5.2 1
5.2-6.0 2
6.1+ 3

If a user does not specify any configuration for log_version, then log_version will be set so that rolling back to the previous minor version of FDB will be possible. FDB will always support loading files generated by default from the next minor version. It will be possible to configure log_version to a higher value on the release that introduces it, it the user is willing to sacrifice the ability to roll back.

This means FDB’s releases will work like the following:

6.0 6.1 6.2 6.3
Configurable 2 2,3 3 3
Default 2 2 3 3
Recoverable 2 2,3 2,3 3

Where…

  • “configurable” means values considered an acceptable configuration setting for fdbcli> configure log_version:=N.
  • “default” means what log_version will be if you don’t configure it.
  • “recoverable” means that FDB can load files that were generated from the specified log_version.

Configuring to a log_version will cause FDB to use the maximum of that log_version and default log_version. The default log_version will always be the minimum configurable log version. This is done so that manually setting log_version once, and then upgrading FDB multiple times, will eventually cause a low log_version left in the database configuration to act as a request for the default.

As a concrete example, 6.1 will introduce a new transaction log feature with on-disk format implications. If you wish to use it, you’ll first have to configure log_version:=3. Otherwise, after upgrading to FDB6.2, it will become the default. If problems are discovered when upgrading to FDB6.2, then roll back to FDB6.1. (Theoretically. See scope restrictions above.)

Implementation Details

fdbcli> configure log_version:=3 sets \xff/conf/log_version to 3. This version is also persisted as part of the LogSystemConfig and thus DBCoreState, so that any code handling the log system will have access to the log_version that was used to create it.

Changing log_version will result in a recovery, and FoundationDB will recover into the requested transaction log implementation. This involves locking the previous generation of transaction logs, and then recruiting a new generation of transaction logs. FDB will load \xff/conf/log_version as the requested log_version, and when sending a InitializeTLogRequest to recruit a new transaction log, it uses the maximum of the requested log version and the default log_version.

A worker, when receiving an InitializeTLogRequest, will initialize a transaction log corresponding to the requested log_version. Transaction logs can pack multiple generations of transaction logs into the same shared entity, a SharedTLog. SharedTLog instances correspond to one set of files, and
will only contain transaction log generations of the same log_version.

This allows us to have multiple generations of transaction logs running within one worker that have different log_versions, and if the worker crashes and restarts, we need to be able to recreate those transaction log instances.

Transaction logs maintain two types of files, one is a pair files prefixed with logqueue- that are the DiskQueue, and the other is the metadata store, which is normally a mini ssd-2 storage engine running within the transaction log.

When a worker first starts, it scans its data directory for any files that were instances of a transaction log. It then needs to construct a transaction log instance that can read the format of the file to be able to reconnect the data in the files back to the FDB cluster, so that it can be used in a recovery if
needed.

This presents a problem that the worker needs to know all the configuration options that were used to decide the file format of the transaction log before it can rejoin a cluster and get far enough through a recovery to find out what that configuration was. To get around this, the relevant configuration options have been added to the file name so that they’re available when scanning the list of files.

Currently, FDB identifies a transaction log instance via seeing a file that starts with log-, which represents the metadata store. This filename has the format of log-<UUID>.<SUFFIX> where UUID is the logId, and SUFFIX tells us if the metadata store is a memory or ssd storage engine file.

This format is being changed to log2-<KV PAIRS>-<UUID>.<SUFFIX>, where KV PAIRS is a small amount of information encoded into the file name to give us the metadata about the file that is required. According to POSIX, the characters allowed for “fully portable filenames” are A–Z a–z 0–9 . _ - and the filename length should stay under 255 characters. This leaves only _ as the only character not already used. Therefore, the KV pair encoding K1_V1_K2_V2_..., so keys and values separated by an _, and kv pairs are also separated by an _.

The currently supported keys are:

V A copy of log_version
LS log_spill, a new configuration option in 6.1

and any unrecognized keys are ignored, which will likely help forward compatibility.

An example file name is log2-V_3_LS_2-46a5f353ac18d787852d44c3a2e51527-0.fdq

Testing

SimulationConfig has been changed to randomly set log_version according to what is supported. This means that with restarting upgrade tests that simulate upgrading from N to N+1, the N+1 version will see files that came from an FDB running with any log_version value that was previously supported. If N+1 can’t handle the files correctly, then the simulation test will fail.

ConfigureTest tries randomly toggling log_version up and down in a live database, along with all the other log related options. Some are valid, some are invalid and should be rejected, or will cause ASSERTs in later parts of the code.

I’ve added a new test, ConfigureTestRestart that tests changing configurations and then upgrading FDB, to cover testing that upgrades still happen correctly when log_version has been changed. This also verifies that on-disk formats for those log_versions are still loadable by future FDB versions.

There are no tests that mix the ConfigureDatabase and Attrition workloads. It would be good to do so, to cover the case of log_version changes in the presence of failures, but one cannot be added easily. The simulator calculates what processes/machines are safe to kill by looking at the current configuration. For ConfigureTest, this isn’t good enough, because triple could mean that there are three replicas, or that the FDB cluster just changed from single to triple and only have one replica of data until data distribution finishes. It would be good to add a ConfigureKillTest sometimein the future.

For FDB to actually announce that rolling back from N+1 to N is supported, there will need to be downgrade tests from N+1 to N also. The default in N+1 should always be recoverable within N. As FDB isn’t promising forward compatibility yet, these tests haven’t been implemented.

Forward Compatibility for Transaction Logs Operational Guide

Notable Behavior Changes

When release notes mention a new log_version is available, after deploying that release, it’s worth considering upgrading log_version. Doing so will allow a controlled upgrade, and reduce the number of new changes that will take effect when upgrading to the next release.

However, one could entirely ignore the existence of log_version, and nothing bad will happen.

Observability

When running with a non-default log_version, the setting will appear in fdbcli> status.

Monitoring and Alerting

If anyone is doing anything that relies on the file names the transaction log uses, they’ll be changing.


(David Scherer) #2

I like the idea of supporting (limited) downgrades.

I have a couple of concerns about this feature from a UX perspective, though. It doesn’t resolve all problems with downgrades and it is very low level. So I would like to discuss a design for full forward compatibility. I think it could mostly be built on top of the merged functionality, but it might be worth changing the configuration format before a release.

I think what I would want is something like a “minimum downgrade version” in configuration.

It would have the same version-based namespace as API versions. This eliminates a lot of unnecessary operational knowledge about which versions of FDB support which log versions etc.

New databases would be created with a minimum downgrade version equal to the current version - they may not be downgraded and receive all features immediately. (This is an improvement over the documented design as I understand it.)

Upgrade recovery transactions would set the minimum downgrade version to the maximum of its previous value and the minimum supported by the upgraded version (they may increase it, but will do so as little as possible. Release notes for each version will document the minimum downgrade version supported.). Then when the administrator is satisfied that the upgrade does not need to be rolled back they can increase the minimum downgrade version using the CLI. (There should be friendly support for this, a warning if a change to minimum downgrade version makes the last upgrade irreversible, etc, not just a configure :=)

Decreasing the minimum downgrade version is not permitted. (It could be permitted if the database can prove that no incompatible features are in use, but this sounds tricky and doesn’t seem like a requirement to me)

The minimum downgrade version is the maximum API version that can be selected by the cluster’s clients. Clients using the latest API version basically guarantee that a downgrade will not be smooth. This should also prevent the creation of new mutations that are unsupported by the downgrade version of FDB (which would make a downgrade fail). (The latter could be accomplished without the former by validating each transaction, but then you would still run into clients unable to connect after a downgrade. There could conceivably be an orthogonal option to enable that behavior.)

Storage engines which were not supported at the minimum downgrade version may not be configured. Individual storage engines are required to be forward compatible to the minimum downgrade version; this is trivial if file format changes are accomplished via “new storage engines” but alternatively a storage engine may deploy forward-compatible file format changes based on the minimum downgrade version.

The log version is selected automatically to be the highest version permitted by the minimum downgrade version.

I’m not sure if there are any impacts on backup, etc. (I don’t think so.)

We should do downgrade testing similarly to how we presently do upgrade testing, ensuring that it is always possible to downgrade successfully to any version >= the minimum downgrade version.


(Alex Miller) #3

This is a “feature” that was intended to be incomplete and low level, as I attempted to preface in the background. This is work that’s being done with the goal of eventually being a part of some larger “Forward Compatibility for FDB” design, like you’ve laid out. I don’t really see log_version sticking around in the long term, as it should be superseded by something like the “minimum downgrade version” support. This chunk of the work is being done now to prevent incoming tlog changes from being the reason that one can’t roll back from 6.1.X to 6.0, if testing turns out that there’s nothing else blocking that. Once something like “minimum downgrade version” exists, transitioning off of log_version should be relatively easy.

Ah, this thread seems to suggest that the answer is yes, which I also hadn’t thought about yet. Thanks!


(David Scherer) #4

Yes, I have no particular complaints about the PR. My suggestion is more that the design for forward compatibility overall should be discussed in public before implementing, and the sooner the better (since the design may impact the details of subfeatures like this one).

Coordinator state is another thing to think about, since it is particularly hard to make the on-disk format dependent on database-specific configuration.


(Alec Grieser) #5

That’s true, I suppose, though we do have a design section of the repository where we keep a few things: https://github.com/apple/foundationdb/tree/master/design

Unlike this forum, though, I suppose there isn’t as much of an opportunity for discussion.


(Meng Xu) #6

Yeah, we probably will move the only two documents under the design folder to somewhere.


(David Scherer) #7

Keeping design docs in the source repo isn’t such a bad idea, it just isn’t sufficient to enable public discussion of a design. Putting design docs in there and posting a forum thread announcing them might not be a bad way to do it?


(Austin Seipp) #8

FWIW, a lot of projects do something of this nature:

  • Create a patch which adds the rough draft of the design for feedback
  • Discussion ensues on the given patch (the PR submitted to GitHub)
  • Author refines the draft appropriately
  • Eventually, once consensus(!) is reached, you commit it to the repository.

Most of the time people call these “RFCs” and accept them only under certain conditions (project maintainers sign off, there is someone committed to executing on the task in a reasonable timeframe, etc etc). There are also occasional “post commit” updates you might make – noting in the RFC “what version this appeared in” or “supersceded by RFC #12345”, etc.

GitHub is of course a code review tool and not a collaborative document editor so there is some friction here, but this approach works pretty good IMO. The real trick is the politics of it all – who gets the last say on accepting things, stuff of that nature.