FoundationDB

[RESOLVED?] Fdbbackup 'Platform error' on 'file_backup_write_range'


#1

TL;DR fdbbackup seems to require permission for parent directory of a backup url. (e.g for file:///path/to/backup, permission of /path/to do matter)
UPDATE: It seems that parent of parent directory affect the error.


I’m trying to execute fdbbackup.

Using the latest docker file of Simple Dockerfile.

Start a server.

$ docker run --init --rm --name=fdb-a -v fdb-a:/var/lib/foundationdb hiroshi3110/foundationdb:5.1.5-1_ubuntu-16.04 ./start.sh
Configure new cluster.
/etc/init.d/foundationdb: line 94: log_daemon_msg: command not found
/etc/init.d/foundationdb: line 100: log_end_msg: command not found
/etc/foundationdb/fdb.cluster is now using address 172.17.0.2
Time="1526304215.233771" Severity="10" LogGroup="default" Process="fdbmonitor": Started FoundationDB Process Monitor 5.1 (v5.1.5)
Database created

Then try to do fdbbackup…

$ docker exec -ti fdb-a bash
root@93f96ad9d4c0:~# mkdir backup
root@93f96ad9d4c0:~# chmod a+w backup/
root@93f96ad9d4c0:~# fdbbackup status
No previous backups found.

root@93f96ad9d4c0:~# fdbbackup start -d file:///root/backup
The backup on tag `default' was successfully submitted.
root@93f96ad9d4c0:~# fdbbackup status
The backup on tag `default' is in progress to file:///root/backup/backup-2018-05-14-13-26-05.896971.
Snapshot interval is 864000 seconds.  The initial snapshot is still running.

Details:
 LogBytes written - 0
 RangeBytes written - 0
 Last complete log version and timestamp        - 149804665, 05/14/18 13:26:06
 Last complete snapshot version and timestamp   - N/A, N/A
 Current Snapshot start version and timestamp   - 149804665, 05/14/18 13:26:06
 Expected snapshot end version and timestamp    - 149804665, 05/14/18 13:26:06
 Backup supposed to stop at next snapshot completion - Yes
Recent Errors (since initialization)
11 second(s) ago : 'Platform error' on 'file_backup_write_range'

root@93f96ad9d4c0:~# ls -l backup/
total 0

How can I do fdbbackup properly?
Thanks in advanced as always.


(A.J. Beamon) #2

Would be willing to send the log file from the failing process (either here or via direct message)? By default, they are configured to go to /var/log/foundationdb, and you’d want to find the file that has Severity="40" events. You could probably also grep for platform_error.

The file may be big, in which case an excerpt from around the error will likely be sufficient in this case.


#3

I found following lines in /var/log/foundationdb/trace.172.017.000.002.30.1526304215.so2pAv.1.xml

...
<Event Severity="10" Time="1526304366.223816" Type="FileBackupSnapshotDispatchStats" Machine="172.17.0.2:30" ID="0000000000000000" BackupUID="609254dc38faeaaf" AllShards=
"1" ShardsDone="0" ShardsNotDone="1" ExpectedShardsDone="1" ShardsToDispatch="1" SnapshotBeginVersion="149804665" SnapshotTargetEndVersion="149804665" NextDispatchVersion
="149834639" CurrentVersion="149834639" TimeElapsed="1" SnapshotIntervalSeconds="864000" logGroup="default"/>
<Event Severity="10" Time="1526304366.223816" Type="FileBackupSnapshotDispatchAddingTasks" Machine="172.17.0.2:30" ID="0000000000000000" TasksToAdd="1" NewBatchSize="1" S
uppressedEventCount="0" logGroup="default"/>
<Event Severity="10" Time="1526304366.225079" Type="FileBackupSnapshotRangeDispatched" Machine="172.17.0.2:30" ID="0000000000000000" BackupUID="609254dc38faeaaf" CurrentV
ersion="149834639" ScheduledVersion="-1" BeginKey="" EndKey="\xff" SuppressedEventCount="0" logGroup="default"/>
<Event Severity="10" Time="1526304366.229581" Type="FileBackupSnapshotDispatchFinished" Machine="172.17.0.2:30" ID="0000000000000000" BackupUID="609254dc38faeaaf" AllShar
ds="1" ShardsDone="1" ShardsNotDone="0" SnapshotBeginVersion="149804665" SnapshotTargetEndVersion="149804665" CurrentVersion="149834639" SnapshotIntervalSeconds="864000" 
logGroup="default"/>
<Event Severity="10" Time="1526304366.230304" Type="FileBackupLogDispatch" Machine="172.17.0.2:30" ID="0000000000000000" BeginVersion="149804665" EndVersion="169804665" R
estorableVersion="-1" SuppressedEventCount="0" logGroup="default"/>
<Event Severity="10" Time="1526304366.231038" Type="FileBackupLogsDispatchContinuing" Machine="172.17.0.2:30" ID="0000000000000000" BackupUID="609254dc38faeaaf" BeginVersion="149804665" EndVersion="169804665" SuppressedEventCount="0" logGroup="default"/>
<Event Severity="10" Time="1526304366.244016" Type="FileBackupRangeStart" Machine="172.17.0.2:30" ID="0000000000000000" BackupUID="609254dc38faeaaf" BeginKey="" EndKey="\xff" TaskKey="a0d28056c6965e4726bb7dfb52aca5c4" SuppressedEventCount="0" logGroup="default"/>
<Event Severity="40" Time="1526304366.245468" Type="AbsolutePathError" Machine="172.17.0.2:30" ID="0000000000000000" Filename="/root/backup/backup-2018-05-14-13-26-05.896971" UnixErrorCode="d" UnixError="Permission denied" logGroup="default" Backtrace="addr2line -e backup_agent.debug -p -C -f -i 0x9283c4 0x927492 0x913525 0x48b6c7 0x47b459 0x4642cb 0x54057f 0x5406b6 0x74ab50 0x75265d 0x752822 0x74ab50 0x756fc6 0x75878a 0x664520 0x69cfdd 0x6f1960 0x69a35f 0x69a49a 0x6833d8 0x682d58 0x682f4d 0x69ee30 0x70ecf8 0x81cce8 0x81cf39 0x457820 0x8fd315 0x648b2e 0x417728 0x7f2c83a02830"/>
<Event Severity="20" Time="1526304366.245468" Type="BackupContainer" Machine="172.17.0.2:30" ID="0000000000000000" Description="Invalid container specification.  See help." URL="file:///root/backup/backup-2018-05-14-13-26-05.896971" logGroup="default"/>
<Event Severity="20" Time="1526304366.245468" Type="TB_ExecuteFailure" Machine="172.17.0.2:30" ID="0000000000000000" TaskUID="a0d28056c6965e4726bb7dfb52aca5c4" TaskType="file_backup_write_range" Priority="0" Error="platform_error" ErrorDescription="Platform error" ErrorCode="1500" logGroup="default"/>
<Event Severity="20" Time="1526304366.245468" Type="FileBackupError" Machine="172.17.0.2:30" ID="0000000000000000" Error="platform_error" ErrorDescription="Platform error" ErrorCode="1500" BackupUID="609254dc38faeaaf" Description="'Platform error' on 'file_backup_write_range'" TaskInstance="0" logGroup="default"/>
<Event Severity="40" Time="1526304366.259559" Type="AbsolutePathError" Machine="172.17.0.2:30" ID="0000000000000000" Filename="/root/backup/backup-2018-05-14-13-26-05.896971" UnixErrorCode="d" UnixError="Permission denied" logGroup="default" Backtrace="addr2line -e backup_agent.debug -p -C -f -i 0x9283c4 0x927492 0x913525 0x48b6c7 0x47b459 0x4642cb 0x431aeb 0x74ab50 0x75265d 0x752822 0x74ab50 0x756fc6 0x75878a 0x664520 0x69cfdd 0x6f1960 0x69a35f 0x69a49a 0x6833d8 0x682d58 0x682f4d 0x69ee30 0x70ecf8 0x81cce8 0x81cf39 0x457820 0x8fd315 0x648b2e 0x417728 0x7f2c83a02830"/>
<Event Severity="20" Time="1526304366.259559" Type="BackupContainer" Machine="172.17.0.2:30" ID="0000000000000000" Description="Invalid container specification.  See help." URL="file:///root/backup/backup-2018-05-14-13-26-05.896971" logGroup="default"/>
<Event Severity="20" Time="1526304366.259559" Type="TB_ExecuteFailure" Machine="172.17.0.2:30" ID="0000000000000000" TaskUID="3706d5a30d183de9eb6476f1fc1479dd" TaskType="file_backup_write_logs" Priority="0" Error="platform_error" ErrorDescription="Platform error" ErrorCode="1500" logGroup="default"/>
<Event Severity="20" Time="1526304366.259559" Type="FileBackupError" Machine="172.17.0.2:30" ID="0000000000000000" Error="platform_error" ErrorDescription="Platform error" ErrorCode="1500" BackupUID="609254dc38faeaaf" Description="'Platform error' on 'file_backup_write_logs'" TaskInstance="0" logGroup="default"/>
...

There are “Permission denied” and “Invalid container specification”.


(A.J. Beamon) #4

I believe the backup agents are being run with the group and user configured in foundationdb.conf. Does that user have permissions to write to the location specified? If that is indeed the problem, then I think the error message in backup status failed us a little bit here, and I can file an issue to better indicate what went wrong in status. We may still do that if your problem isn’t an issue with permissions, but in that case we should try to figure out what’s happening first.


#5

Ah, I figured out what problem is and I got fdbbackup to work!

For the backup url file:///root/backup, last time I did only chmod a+w /root/backup, but it requireds permission for parent directory, /root.

$ docker exec -ti fdb-a bash
root@bac49c7d0f29:~# mkdir backup
root@bac49c7d0f29:~# chmod a+w backup/
root@bac49c7d0f29:~# ls -l backup
drwxrwxrwx 2 root root    4096 May 16 00:25 backup
root@bac49c7d0f29:~# fdbbackup status
No previous backups found.

root@bac49c7d0f29:~# ls -ld /root
drwx------ 1 root root 4096 May 16 00:25 /root
root@bac49c7d0f29:~# chmod 777 /root
root@bac49c7d0f29:~# ls -ld /root
drwxrwxrwx 1 root root 4096 May 16 00:25 /root
root@bac49c7d0f29:~# fdbbackup start -d file:///root/backup
The backup on tag `default' was successfully submitted.
root@bac49c7d0f29:~# fdbbackup status
The backup on tag `default' is in progress to file:///root/backup/backup-2018-05-16-00-27-23.332032.
Snapshot interval is 864000 seconds.  The initial snapshot is still running.

Details:
 LogBytes written - 0
 RangeBytes written - 13
 Last complete log version and timestamp        - 119836947, 05/16/18 00:27:22
 Last complete snapshot version and timestamp   - 119898807, 05/16/18 00:27:22
 Current Snapshot start version and timestamp   - 119836947, 05/16/18 00:27:22
 Expected snapshot end version and timestamp    - 119836947, 05/16/18 00:27:22
 Backup supposed to stop at next snapshot completion - Yes

root@bac49c7d0f29:~# ls /root/backup/
backup-2018-05-16-00-27-23.332032

Anyway, thanks for your help.


#6

When I try to do fdbbackup start -d file:///root/backup/backup. Not only the permission of /root/backup affect the error, chmod 777 /root did the fix.


#7

Well, I got a backup anyhow, archive it as tar.gz and copy it to another docker foundation db container, extract the archive then…

root@0b91e6da6b0c:~# fdbrestore start -r file:///root/backup/backup/backup-2018-05-16-12-26-35.824367 -w
Backup Description
URL: file:///root/backup/backup/backup-2018-05-16-12-26-35.824367
Restorable: true
Snapshot:  startVersion=286935011130 (2018-05-19 20:46:01)  endVersion=287254409265 (2018-05-19 20:51:21)  totalBytes=181474532  restorable=true
SnapshotBytes: 181474532
MinLogBeginVersion:      285796554561 (2018-05-19 20:27:03)
ContiguousLogEndVersion: 287267143137 (2018-05-19 20:51:33)
MaxLogEndVersion:        287267143137 (2018-05-19 20:51:33)
MinRestorableVersion:    287254409265 (2018-05-19 20:51:21)
MaxRestorableVersion:    287267143137 (2018-05-19 20:51:33)
Restoring backup to version: 287267143137
Tag: default  UID: 9c1e9fd9e45725299a8fecb4ee165496  State: queued  Blocks: 0/0  BlocksInProgress: 0  Files: 0  BytesWritten: 0  ApplyVersionLag: 0  LastError: None
Tag: default  UID: 9c1e9fd9e45725299a8fecb4ee165496  State: queued  Blocks: 0/0  BlocksInProgress: 0  Files: 0  BytesWritten: 0  ApplyVersionLag: 0  LastError: ''Platform error' on 'restore_start'' 0s ago.

I got familiar ''Platform error' on 'restore_start''.

I did chmod -R 777 /root/backup to be sure the backup directory tree can read from anyone.


(Steve Atherton) #8

It’s most likely still a permission issue. The ‘restore_start’ task first lists all of the files in the backup path and stages a list of blocks to load in a queue in the database. Can you verify that the user running the backup agents can read and list files at /root/backup/backup/backup-2018-05-16-12-26-35.824367?


#9

After
fdbrestore start -r file:///root/backup/backup/backup-2018-05-16-12-26-35.824367

I changed permission of the base directory,
chmod 777 /root made a trick like fdbbackup.

This get fdbrestore be running.

root@0b91e6da6b0c:~# fdbrestore status
Tag: default  UID: 9c1e9fd9e45725299a8fecb4ee165496  State: running  Blocks: 16/223  BlocksInProgress: 207  Files: 115  BytesWritten: 15058445  ApplyVersionLag: 0  LastError: ''Platform error' on 'restore_start'' 71s ago.
  URL: file:///root/backup/backup/backup-2018-05-16-12-26-35.824367  Begin: ''  End: '\xff'  AddPrefix: ''  RemovePrefix: ''  Version: 287267143137