FoundatioDBBackup against azure blobstore - unknown error

I’m trying to set up backup to azure blobstore through a minIO gateway, but when I run
root@foundationdb-cluster-backup-agents-7f47c7986b-6z259:/var/fdb# fdbbackup start -d 'blobstore://timelordsfdbtest@minio-gateway-service:9000/testbackup?sc=0&bucket=testbucket' --log --trace_format json --knob_http_verbose_level=3
I get
ERROR: Could not create backup container: Operation timed out
ERROR: An error was encountered during submission
Fatal Error: Backup error

The trace logs shows no signs of HTTP requests sent, except this snippet:

Summary

{
“Severity”: “10”,
“Time”: “1603289138.352557”,
“Type”: “AsyncFileOpened”,
“ID”: “0000000000000000”,
“SuppressedEventCount”: “0”,
“Filename”: “/var/backup-credentials/azure_blobstore_credentials_for_minio”,
“Fd”: “14”,
“Flags”: “2228225”,
“Machine”: “10.244.158.11:22323”,
“LogGroup”: “default”
}
{
“Severity”: “40”,
“Time”: “1603289138.352762”,
“Type”: “SystemError”,
“ID”: “0000000000000000”,
“Error”: “unknown_error”,
“ErrorDescription”: “An unknown error occurred”,
“ErrorCode”: “4000”,
“Backtrace”: “addr2line -e fdbbackup.debug -p -C -f -i 0xaa654c 0xaa5e68 0xaa5f31 0xa69453 0x42fe8a 0x5d0808 0x5cefc8 0x5d0a8c 0x4f7c48 0x5d56bd 0x5d585c 0x4f7c48 0x5e3729 0x5d66c8 0x5d681d 0x9bf218 0x4f7bb0 0xaeac50 0x83c8b2 0x4b1469 0x7fb91832ab97”,
“Machine”: “10.244.158.11:22323”,
“LogGroup”: “default”
}
{
“Severity”: “20”,
“Time”: “1603289138.352762”,
“Type”: “BlobStoreEndpointRequestFailedRetryable”,
“ID”: “0000000000000000”,
“Error”: “unknown_error”,
“ErrorDescription”: “An unknown error occurred”,
“ErrorCode”: “4000”,
“SuppressedEventCount”: “0”,
“ConnectionEstablished”: “1”,
“RemoteHost”: “minio-gateway-service”,
“Verb”: “HEAD”,
“Resource”: “/testbucket”,
“ThisTry”: “1”,
“RetryDelay”: “2”,
“Machine”: “10.244.158.11:22323”,
“LogGroup”: “default”
}
{
“Severity”: “10”,
“Time”: “1603289138.352762”,
“Type”: “AsyncFileClosed”,
“ID”: “0000000000000000”,
“SuppressedEventCount”: “0”,
“Fd”: “14”,
“Machine”: “10.244.158.11:22323”,
“LogGroup”: “default”
}

Setup:

Kubernetes cluster in Azure
minIO service minio/minio:RELEASE.2020-10-18T21-54-12Z acting as a gateway to azure blobstore.
fbd-kubernetes-operator 0.18.0
foundationdb-cluster 6.2.20
FoundationDBBackup object 6.2.20

Logs from the kubernetes operator:

Summary

2020-10-21T10:46:39.203Z INFO controller Running command {“namespace”: “timeseries”, “cluster”: “foundationdb-cluster”, “path”: “/usr/bin/fdb/6.2/fdbbackup”, “args”: ["/usr/bin/fdb/6.2/fdbbackup", “status”, “–json”, “-C”, “/tmp/600276910”, “–log”, “–logdir”, “/var/log/fdb”]}

2020-10-21T10:46:39.482Z INFO controller Command completed {“namespace”: “timeseries”, “cluster”: “foundationdb-cluster”, “output”: "{“SchemaVersion”:“1…”}

2020-10-21T10:46:39.483Z INFO controller Running command {“namespace”: “timeseries”, “cluster”: “foundationdb-cluster”, “path”: “/usr/bin/fdb/6.2/fdbbackup”, “args”: ["/usr/bin/fdb/6.2/fdbbackup", “start”, “-d”, “blobstore://timelordsfdbtest@minio-gateway-service:9000/fdb-backup?bucket=fdb-backups&sc=0”, “-s”, “60”, “-z”, “-C”, “/tmp/249282357”, “–log”, “–logdir”, “/var/log/fdb”]}

2020-10-21T10:46:49.554Z ERROR controller Error from FDB command {“namespace”: “timeseries”, “cluster”: “foundationdb-cluster”, “code”: -1, “stdout”: “”, “stderr”: “”, “error”: “signal: killed”}

2020-10-21T10:46:49.554Z INFO controller Reconciliation terminated early {“namespace”: “timeseries”, “backup”: “foundationdb-cluster”, “lastAction”: “controllers.StartBackup”}

2020-10-21T10:46:49.554Z ERROR controller Error in reconciliation {“subReconciler”: “controllers.StartBackup”, “namespace”: “timeseries”, “backup”: “foundationdb-cluster”, “error”: “signal: killed”}

github.com/go-logr/zapr.(*zapLogger).Error

/go/pkg/mod/github.com/go-logr/zapr@v0.1.0/zapr.go:128

github.com/FoundationDB/fdb-kubernetes-operator/controllers.(*CliAdminClient).runCommand

/workspace/controllers/admin_client.go:255

github.com/FoundationDB/fdb-kubernetes-operator/controllers.(*CliAdminClient).StartBackup

/workspace/controllers/admin_client.go:509

github.com/FoundationDB/fdb-kubernetes-operator/controllers.StartBackup.Reconcile

/workspace/controllers/start_backup.go:46

github.com/FoundationDB/fdb-kubernetes-operator/controllers.(*FoundationDBBackupReconciler).Reconcile

/workspace/controllers/backup_controller.go:82

sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler

/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.4.0/pkg/internal/controller/controller.go:256

sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem

/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.4.0/pkg/internal/controller/controller.go:232

sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).worker

/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.4.0/pkg/internal/controller/controller.go:211

k8s.io/apimachinery/pkg/util/wait.JitterUntil.func1

/go/pkg/mod/k8s.io/apimachinery@v0.17.0/pkg/util/wait/wait.go:152

k8s.io/apimachinery/pkg/util/wait.JitterUntil

/go/pkg/mod/k8s.io/apimachinery@v0.17.0/pkg/util/wait/wait.go:153

k8s.io/apimachinery/pkg/util/wait.Until

/go/pkg/mod/k8s.io/apimachinery@v0.17.0/pkg/util/wait/wait.go:88

The backup pods connects to the foundationdb cluster (fdbcli --exec status on the backup agent pods show they’re healthy)

When I try to connect to minIO using aws client (installed manually on the backup agents)

aws --endpoint-url http://minio-gateway-service:9000 s3 ls

This command gives me the buckets in azure blobstore, and shows that the minIO azure gateway is s3 compatible. It also shows that minio-gateway-service:9000 is reachable from the pod. Note: it is not configured using SSL.

Furthermore, the file /var/backup-credentials/azure_blobstore_credentials_for_minio file contains

Summary
{
  "accounts" : {
    "timelordsfdbtest@minio-gateway-service" :   { "secret" : "7KeyHasBeenReplacedByThisSentenceA==" }
  }
}

Any ideas on what could be wrong, or other things I could try out, would be highly appreciated!

1 Like

The problem here is due to FDB choking when the length of secret key is > 64 characters. I have compiled a custom version of minio with hardcoded Azure Blob credentials, and configuring something short for FDB <-> minio authentication works.

3 Likes

Thanks, Petr, you saved me from lots of wasteful trial and failures!
Do you know if this is (going to be) fixed in any specific FDB version?

Is there an issue for this? I don’t recall hearing about this bug before…

I can confirm that a shorter secret key indeed fixed the issue, and I can now see backup files in azure blobstore :slight_smile:
(Have yet to verify the integrity of the backup, but at least it’s a step forward)

I used the same approach as you Petr-tichy, changed a few lines in minio to use an azure api key from environment variable, instead of supplied by the user.
Would be nice to have it fixed in foundationdb though, but you saved the day Petr-tichy :slight_smile:

I found the problem. In FDB’s blob client, the hmac_sha1() method handles the case where the key is shorter than the 64 byte block size but doesn’t handle the case where it is longer as per

I’ll fix this soon.

@petr-tichy @matiasholte I have a PR against release-6.2 with this fix, could either of you verify that it works?

Also @petr-tichy thanks so much for figuring out what the issue was.

I get errors when I compile foundationdb locally (not related to your fix), so I cannot test it until I have an image to download.
Shouldn’t be too hard to test, using minio with a long password?

No problem, yes I’ll test with minio today.

Yes, I can confirm this fixes the issue with authentication, but there is yet something else incompatible that I’m looking into. I suspect something with the partial file retrieval…

2 Likes

--knob_http_verbose_level=3 will dump full requests/responses to stdout. 2 will omit response content. I think the response content bytes are printed directly, no escaping.

Yes, that was it. And thanks @SteavedHams for the tip. The MinIO possibly inadvertenly forwards the Content-MD5 header back to FDB, but does that also for requests with the Range header, where the response body is just the requested file part, but the Content-MD5 is for the whole file. So all requests for files bigger than read_block_size would fail the check.
There is a simple workaround to set the rbs backup URL parameter to something high (2^30 worked for me). This allowed me to restore 5 TB from Azure Blob Storage.

I scanned the AWS S3 specs but found no mention of the Content-MD5 on responses. It looks like AWS S3 never uses it. (although it is used by the client in requests). So I suspect that this is the fix: https://github.com/apple/foundationdb/pull/3978

To be clear, this workaround actually means that every read of any block of a backup file by an FDB restore job will actually read the entire file and then just extract the part it is interested in and discard the rest. In the current restore process, backup files are read in parallel as each 1MB block of a backup file is independently decodable and processed in a random order to enable greater restore parallelism and workload balancing across database shards.