FoundationDB backups to CIFS mounted filesystem

I’m configuring foundationdb backups for a production cluster. The backups are to be stored on a SAN which has been mounted as follows:

//192.168.1.10/backup /backup cifs nobrl,uid=foundationdb,credentials=/root/backup.cred 0 0

This has been done on all servers in the cluster.

Running

fdbbackup start -d file:///backup -w -s 0

the backup writes to the subdirectory, however it hangs, never completing. When running fdbbackup status, the status shows as:

The backup on tag `default' is in progress to file:///backup/backup-2020-09-01-00-06-03.049319.
BackupUID: ae1d8c3fcfd99e837fcf828415a27ec1
BackupURL: file:///backup/backup-2020-09-01-00-06-03.049319
Snapshot interval is 0 seconds.  The initial snapshot is still running.

Details:
 LogBytes written - 0
 RangeBytes written - 0
 Last complete log version and timestamp        - 27051456378213, 2020/09/01.00:07:38+0000
 Last complete snapshot version and timestamp   - N/A, N/A
 Current Snapshot start version and timestamp   - 27051456378213, 2020/09/01.00:07:38+0000
 Expected snapshot end version and timestamp    - 27051456378213, 2020/09/01.00:07:38+0000
 Backup supposed to stop at next snapshot completion - Yes
Recent Errors (since initialization)
8 second(s) ago : 'Disk i/o operation failed' on 'file_backup_write_range_5.2'

This is also shown in the logs on each cluster server:

<Event Severity="20" Time="1598917680.404502" Type="TB_ExecuteFailure" ID="0000000000000000" Error="io_error" ErrorDescription="Disk i/o operation failed" ErrorCode="1510" TaskUID="275ba25035da7df14a168b920035b9c0" TaskType="file_backup_write_range_5.2" Priority="1" Machine="182.168.1.10:6586" LogGroup="default" />
<Event Severity="20" Time="1598917680.404502" Type="FileBackupError" ID="0000000000000000" Error="io_error" ErrorDescription="Disk i/o operation failed" ErrorCode="1510" BackupUID="64faf42be020cb1b" Description="'Disk i/o operation failed' on 'file_backup_write_range_5.2'" TaskInstance="0" Machine="192.168.1.10:6586" LogGroup="default" />

Backups to a local disk directory work fine. It is using a CIFS mounted directory which fails. Any suggestion on CIFS flags that would work around the I/O operation error would be helpful.

You should see an event with more details about the specific operation that failed adjacent to those two messages in the logs.

Hi Steve,
Yes, you’re right:

<Event Severity="10" Time="1598918862.161181" Type="SyncError" ID="0000000000000000" Error="io_error" ErrorDescription="Disk i/o operation failed" ErrorCode="1510" Fd="19" Result="-1" UnixErrorCode="16" UnixError="Invalid argument" Machine="192.168.1.10:6586" LogGroup="default" />

Suggesting that the cause is fsync() returning errno 16 (EBUSY)
I’ve attempted the backup with the CIFS option “nostrictsync”, however this does not allow the backup to complete either.

I don’t know enough about CIFS to know why fsync (or fdatasync) does not work, but I don’t recommend using a backup medium that does not support it.

Unfortunately I’ve been unable to spend any further time on why CIFS in this case has failed me. I swapped to NFS as the network share technology, and this has worked successfully.

Sorry I won’t be much help for future readers :slight_smile: However if someone solves this issue in the future, I’d appreciate knowing how.

A quick google search suggests that CIFS doesn’t actually support fsync() but some applications allow ignoring the error. I cannot recommend or endorse that here. Client side write buffering or even server side write buffering could prevent backup data from actually making it to disk in the event of a process or host failure without fsync().

Backing up to NFS may also have some issues: https://github.com/apple/foundationdb/issues/3522#issuecomment-702185060