Backwards compatibility of files generated in performant restore, milestoned for v6.3

We are utilising the backup files generated by FDB for running some analysis on our database. This PR mentions that a new backup format will be put in place, milestoned for FDB v6.3. We would like to be able to continue using our analysis code to parse backup files with the old format, even as we upgrade our database to v6.3 and later, for the newest bells and whistles.
Would it be possible to generate backup files with the old backup format, circa v6.2.7, if our database is upgraded to v6.3 and later? Otherwise, with every upgrade, we would have to modify our analysis code to cater to a new format.

Is your analysis code standalone one or based on the restore code?

FDB 6.3 is expected to have a new backup (which has new log file format) and a completely new restore.
The compatibility relation is like this:
New restore can restore from both old backup and new backup;
New backup can only be restored by the new restore.

Although we have the code to transfer the new backup log format to the old backup log format, it will be slow.

Note that you can have both old backup and new backup running on a FDB 6.3 cluster. Then you will have two backup file sets. Although the new restore plans to support both backups, it will prioritize to optimize for the new backup format.

The long-term plan (in terms of year I think) will be:
Disable the old backup and old restore, and only support the new backup and new restore.

Thanks for your response.

Is your analysis code standalone one or based on the restore code?

Our analysis code is independent of the restore code, but dependent on the format of all backup files generated as of v6.2.7.

Note that you can have both old backup and new backup running on a FDB 6.3 cluster

This seems to seal the deal for us - we will be able to continue using our standalone code on the old backup generated from a cluster running on v6.3, while we can run optimised restore independently if we need, on the new backup being generated simultaneously. I understand this would mean having two backup file sets. It seems to me a better option for us rather than transferring the new backup log format to the old one every time we want to run the analysis (regular, not ad-hoc).

The long-term plan (in terms of year I think) will be:
Disable the old backup and old restore, and only support the new backup and new restore.

Our long term plan would also have to involve moving off of reliance on backup file format for our analysis.

I have a couple of follow up questions:

  1. Will v6.3 continue to support all backup (not restore) functionalities even when taking the old backup, or will it be a reduced set of functionalities? For example, we would be interested to continue using the ability to have a continuous backup, the ability to specify snapshot intervals, etc. while the backup is generated in the old format.
  2. How long would the transfer from old backup format to the new backup format take, for a cluster, say, 100GB in size?
  1. The current plan is to support the old backup without reducing its functionalities in v6.3. The roll out of new backup will take at least at least one FDB sub-version, say v6.4. Until we are 100% confident the new backup works as intended, we will not remove the functionalities of the old backup.

  2. The transfer can be slow. The new backup only changes the mutation file format. So the transfer time depends on how large the mutation files are. Right now, the transfer is a single process program, which can be slow. The converter is at https://github.com/apple/foundationdb/blob/master/fdbbackup/FileConverter.actor.cpp
    @jzhou wrote that converter. He may provide better suggestions.

I’m interested in what type of analysis you are doing with the backup format. Is it running continuously, i.e., analyzing the streams of backup data?

BTW, we have two issues open for the similar problem:

  1. Use a standard backup format: https://github.com/apple/foundationdb/issues/2259
  2. Forensic tool based on backup data: https://github.com/apple/foundationdb/issues/1672

It will be great if we can join effort to make the backup data easier to use for other non-restore purposes.

Thanks, that answers my questions!

Our analysis runs regularly, but not continuously. It effectively tries to recreate the state of the database at a given Dbversion, in order to calculate a checksum of specific subspaces. We require that DbVersion/timestamp to checksum mapping for some of our internal synchronisation needs. We could have done this by restoring the backup to a specific Dbversion using standard fdbrestore, then doing multiple getRanges too, but doing it directly from the backup files was clearly much quicker for us.

The issues you linked are pretty useful. A standard backup format would help streamline a lot of things. Regarding the forensic tool, we already do something similar - we have a separate subspace to store transaction history, mapped to unique id/timestamp. But we store it for transactions on our own model objects, and not for generic key-value pairs, as those are the transactions we care about. I’ll talk with my team internally and see where we can contribute, if we get resources for it.

With a faster restore now taking shape thanks to your efforts, we might try and move over our standalone checksum analysis to either use that, or store some kind of data in our transaction history subspaces to get the checksum easily at a given dbversion/timestamp. Both cases would help avoid the need to parse backup files altogether.

1 Like