fuCtor
(Alexey Shcherbakov)
April 11, 2025, 8:45am
1
Hi! In FDB version 7.4.0 added new feature bulkdump/bulkload. I try use it, but it don’t work.
For reproduce:
> configure new single ssd
> writemode on
> set foo bar
> bulkdump mode on
> bulkdump dump \x00 \xff /var/fdb/dump
Received Job ID: XXX
DB dropped and I try on clear instance.
> configure new single ssd
> bulkload mode on
> bulkload load XXX \x00 \xff /var/fdb/dump
Received Job ID: XXX
> > bulkload status
Running bulk loading job: XXX
Job information: [BulkLoadJobState]: [JobId]: XXX, [JobRoot]: /var/fdb/dump, [JobRange]: { begin=\x00 end=\xff }, [Phase]: Submitted, [TransportMethod]: LocalFileCopy, [SubmitTime]: 1744360591.077593, [SinceSubmitMins]: 0.333240
Submitted 0 tasks
Finished 0 tasks
Error 0 tasks
Total task count is unknown
As result job hang with phase Submitted.
kakaiu
(Zhe Wang)
April 16, 2025, 5:24am
2
Hi, can you check whether knob ENABLE_READ_LOCK_ON_RANGE and knob SHARD_ENCODE_LOCATION_METADATA are enabled in your cluster? The bulkload feature relies on the two knobs. Also, can you check if the bulkdump completes by checking job-manifest.txt in the output folder?
Thanks!
fuCtor
(Alexey Shcherbakov)
April 16, 2025, 6:48am
3
[fdbserver]
command = /usr/bin/fdbserver
public-address = auto:$ID
listen-address = public
datadir = /var/fdb/data
logdir = /var/fdb/logs
knob_disable_posix_kernel_aio = 1
knob_enable_read_lock_on_range = 1
knob_shard_encode_location_metadata = 1
But I have same behaviour
> bulkload status
Running bulk loading job: XXX
Job information: [BulkLoadJobState]: [JobId]: XXX, [JobRoot]: /var/fdb/dump, [JobRange]: { begin=\x00 end=\xff }, [Phase]: Submitted, [TransportMethod]: LocalFileCopy, [SubmitTime]: 1744785889.619496, [SinceSubmitMins]: 0.112391
And when cancel job:
> bulkload cancel XXX
> bulkload status
No bulk loading job is running
> bulkload history
Assertion job.getTaskCount().present() failed @ /home/foundationdb_ci/src/oOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOo/foundationdb/fdbcli/BulkLoadCommand.actor.cpp 54:
addr2line -e fdbcli.debug -p -C -f -i 0x18f5f1b 0xaa242e 0xaa2ab3 0x10634c8 0x1062b6e 0xaa16b8 0xaa1542 0x13432c4 0x1331480 0x1342228 0x1342082 0x13415b8 0x134144a 0x1334b01 0x1336ac8 0x133698c 0xb23a78 0xb23010 0xb3ae58 0xb3abaa 0x1792b5e 0x179241c 0x193ecc8 0x11ae3a6 0x16c7a24 0x112b9c3 0xe626de 0x7fab0f8bed90
ERROR: An internal error occurred (4100)
kakaiu
(Zhe Wang)
April 16, 2025, 10:26pm
4
Looks like the bulkload job is failed to initialized. Can you help to check the existence of the job-manifest.txt file in the bulkdump folder? Also, can you help to check whether the input bulkload job ID of the bulkload command matches the bulkdump job ID (i.e. dataset ID) you want to load?
For your convenience, you may want to check our latest “quick start” doc in the PR : documentation/sphinx/source/bulkload-user.rst
Please let me know if you have further issues. Thanks!
fuCtor
(Alexey Shcherbakov)
April 17, 2025, 7:55am
5
Can you help to check the existence of the job-manifest.txt file in the bulkdump folder?
Yes, it is exist and IDs are same.
In “quick start”:
Start a sufficient number of SSs because too few can cause bulkload fail (In the above we started 8 SSs).
I made more then 1 SS and job finished successfull.
Thank you.
1 Like