Error "signal: killed" when recovering from backup

When I try to restore my database from a backup using the Kubernetes operator I get the following error in the operator log:

{"level":"info","ts":1631609197.321,"logger":"fdbclient","msg":"Running command","namespace":"verifier","cluster":"sample-cluster","path":"/usr/bin/fdb/6.3/fdbrestore","args":["/usr/bin/fdb/6.3/fdbrestore","start","-r","blobstore://s3-proxy-user@fdb-backup-s3-proxy:80/default?bucket=foundationdbbackup&secure_connection=0","--dest_cluster_file","/tmp/405498555","--log","--logdir","/var/log/fdb"]}
{"level":"error","ts":1631609207.3601973,"logger":"fdbclient","msg":"Error from FDB command","namespace":"verifier","cluster":"sample-cluster","code":-1,"stdout":"No restore target version given, will use maximum restorable version from backup description.\nUsing target restore version 1207990314200\n","stderr":"","error":"signal: killed","stacktrace":"github.com/go-logr/zapr.(*zapLogger).Error\n\t/go/pkg/mod/github.com/go-logr/zapr@v0.2.0/zapr.go:132\nsigs.k8s.io/controller-runtime/pkg/log.(*DelegatingLogger).Error\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.8.3/pkg/log/deleg.go:144\ngithub.com/FoundationDB/fdb-kubernetes-operator/fdbclient.(*cliAdminClient).runCommand\n\t/workspace/fdbclient/admin_client.go:187\ngithub.com/FoundationDB/fdb-kubernetes-operator/fdbclient.(*cliAdminClient).StartRestore\n\t/workspace/fdbclient/admin_client.go:555\ngithub.com/FoundationDB/fdb-kubernetes-operator/controllers.StartRestore.Reconcile\n\t/workspace/controllers/start_restore.go:48\ngithub.com/FoundationDB/fdb-kubernetes-operator/controllers.(*FoundationDBRestoreReconciler).Reconcile\n\t/workspace/controllers/restore_controller.go:77\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.8.3/pkg/internal/controller/controller.go:298\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.8.3/pkg/internal/controller/controller.go:253\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func1.2\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.8.3/pkg/internal/controller/controller.go:216\nk8s.io/apimachinery/pkg/util/wait.JitterUntilWithContext.func1\n\t/go/pkg/mod/k8s.io/apimachinery@v0.20.4/pkg/util/wait/wait.go:185\nk8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1\n\t/go/pkg/mod/k8s.io/apimachinery@v0.20.4/pkg/util/wait/wait.go:155\nk8s.io/apimachinery/pkg/util/wait.BackoffUntil\n\t/go/pkg/mod/k8s.io/apimachinery@v0.20.4/pkg/util/wait/wait.go:156\nk8s.io/apimachinery/pkg/util/wait.JitterUntil\n\t/go/pkg/mod/k8s.io/apimachinery@v0.20.4/pkg/util/wait/wait.go:133\nk8s.io/apimachinery/pkg/util/wait.JitterUntilWithContext\n\t/go/pkg/mod/k8s.io/apimachinery@v0.20.4/pkg/util/wait/wait.go:185\nk8s.io/apimachinery/pkg/util/wait.UntilWithContext\n\t/go/pkg/mod/k8s.io/apimachinery@v0.20.4/pkg/util/wait/wait.go:99"}
{"level":"error","ts":1631609207.3605845,"logger":"controller","msg":"Error in reconciliation","namespace":"verifier","restore":"sample-cluster","subReconciler":"controllers.StartRestore","requeueAfter":0,"error":"signal: killed","stacktrace":"github.com/go-logr/zapr.(*zapLogger).Error\n\t/go/pkg/mod/github.com/go-logr/zapr@v0.2.0/zapr.go:132\ngithub.com/FoundationDB/fdb-kubernetes-operator/controllers.processRequeue\n\t/workspace/controllers/controllers.go:114\ngithub.com/FoundationDB/fdb-kubernetes-operator/controllers.(*FoundationDBRestoreReconciler).Reconcile\n\t/workspace/controllers/restore_controller.go:82\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.8.3/pkg/internal/controller/controller.go:298\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.8.3/pkg/internal/controller/controller.go:253\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func1.2\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.8.3/pkg/internal/controller/controller.go:216\nk8s.io/apimachinery/pkg/util/wait.JitterUntilWithContext.func1\n\t/go/pkg/mod/k8s.io/apimachinery@v0.20.4/pkg/util/wait/wait.go:185\nk8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1\n\t/go/pkg/mod/k8s.io/apimachinery@v0.20.4/pkg/util/wait/wait.go:155\nk8s.io/apimachinery/pkg/util/wait.BackoffUntil\n\t/go/pkg/mod/k8s.io/apimachinery@v0.20.4/pkg/util/wait/wait.go:156\nk8s.io/apimachinery/pkg/util/wait.JitterUntil\n\t/go/pkg/mod/k8s.io/apimachinery@v0.20.4/pkg/util/wait/wait.go:133\nk8s.io/apimachinery/pkg/util/wait.JitterUntilWithContext\n\t/go/pkg/mod/k8s.io/apimachinery@v0.20.4/pkg/util/wait/wait.go:185\nk8s.io/apimachinery/pkg/util/wait.UntilWithContext\n\t/go/pkg/mod/k8s.io/apimachinery@v0.20.4/pkg/util/wait/wait.go:99"}
{"level":"error","ts":1631609207.3613954,"logger":"controller-runtime.manager.controller.foundationdbrestore","msg":"Reconciler error","reconciler group":"apps.foundationdb.org","reconciler kind":"FoundationDBRestore","name":"sample-cluster","namespace":"verifier","error":"signal: killed","stacktrace":"github.com/go-logr/zapr.(*zapLogger).Error\n\t/go/pkg/mod/github.com/go-logr/zapr@v0.2.0/zapr.go:132\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.8.3/pkg/internal/controller/controller.go:302\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.8.3/pkg/internal/controller/controller.go:253\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func1.2\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.8.3/pkg/internal/controller/controller.go:216\nk8s.io/apimachinery/pkg/util/wait.JitterUntilWithContext.func1\n\t/go/pkg/mod/k8s.io/apimachinery@v0.20.4/pkg/util/wait/wait.go:185\nk8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1\n\t/go/pkg/mod/k8s.io/apimachinery@v0.20.4/pkg/util/wait/wait.go:155\nk8s.io/apimachinery/pkg/util/wait.BackoffUntil\n\t/go/pkg/mod/k8s.io/apimachinery@v0.20.4/pkg/util/wait/wait.go:156\nk8s.io/apimachinery/pkg/util/wait.JitterUntil\n\t/go/pkg/mod/k8s.io/apimachinery@v0.20.4/pkg/util/wait/wait.go:133\nk8s.io/apimachinery/pkg/util/wait.JitterUntilWithContext\n\t/go/pkg/mod/k8s.io/apimachinery@v0.20.4/pkg/util/wait/wait.go:185\nk8s.io/apimachinery/pkg/util/wait.UntilWithContext\n\t/go/pkg/mod/k8s.io/apimachinery@v0.20.4/pkg/util/wait/wait.go:99"}

When I run the fdbrestore command manually (from the backup agent pod) the restore works without problems.

The URL s3-proxy-user@fdb-backup-s3-proxy I am using is a proxy to provide an S3 API for Azure blob storage.

Any ideas what might be the problem here or how I can debug this?

I would spin up a foundationdb Pod with the same version and mount the cluster file. Then exec into this container and run the same command and hopefully in the trace logs there should be more information. If you already collect the trace logs of the operator Pod you can also look at them directly. One thing that comes to my mind (since this is a kill) that the default timeout is hit and the go program shutdown the process. If you can run the command in the separate Pod without any issues I would assume that this is the problem. Given the ~10 seconds between the first log line and the second line with the kill (and you’re right the default timeout is 10 seconds), that’s probably the case. Feel free to create an issue in GitHub and file a PR that resolves that issue. The backup and restore feature is still in an alpha stage.

1 Like