New feature bulkdump/bulkload using

Hi! In FDB version 7.4.0 added new feature bulkdump/bulkload. I try use it, but it don’t work.

For reproduce:

> configure new single ssd
> writemode on
> set foo bar
> bulkdump mode on
> bulkdump dump \x00 \xff /var/fdb/dump
Received Job ID: XXX

DB dropped and I try on clear instance.

> configure new single ssd
> bulkload mode on
> bulkload load XXX \x00 \xff /var/fdb/dump
Received Job ID: XXX
> > bulkload status
Running bulk loading job: XXX
Job information: [BulkLoadJobState]: [JobId]: XXX, [JobRoot]: /var/fdb/dump, [JobRange]: { begin=\x00  end=\xff }, [Phase]: Submitted, [TransportMethod]: LocalFileCopy, [SubmitTime]: 1744360591.077593, [SinceSubmitMins]: 0.333240
Submitted 0 tasks
Finished 0 tasks
Error 0 tasks
Total task count is unknown

As result job hang with phase Submitted.

Hi, can you check whether knob ENABLE_READ_LOCK_ON_RANGE and knob SHARD_ENCODE_LOCATION_METADATA are enabled in your cluster? The bulkload feature relies on the two knobs. Also, can you check if the bulkdump completes by checking job-manifest.txt in the output folder?

Thanks!

[fdbserver]
command = /usr/bin/fdbserver
public-address = auto:$ID
listen-address = public
datadir = /var/fdb/data
logdir = /var/fdb/logs
knob_disable_posix_kernel_aio = 1
knob_enable_read_lock_on_range = 1
knob_shard_encode_location_metadata = 1

But I have same behaviour

> bulkload status
Running bulk loading job: XXX
Job information: [BulkLoadJobState]: [JobId]: XXX, [JobRoot]: /var/fdb/dump, [JobRange]: { begin=\x00  end=\xff }, [Phase]: Submitted, [TransportMethod]: LocalFileCopy, [SubmitTime]: 1744785889.619496, [SinceSubmitMins]: 0.112391

And when cancel job:

> bulkload cancel XXX
> bulkload status
No bulk loading job is running
> bulkload history
Assertion job.getTaskCount().present() failed @ /home/foundationdb_ci/src/oOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOo/foundationdb/fdbcli/BulkLoadCommand.actor.cpp 54:
  addr2line -e fdbcli.debug -p -C -f -i 0x18f5f1b 0xaa242e 0xaa2ab3 0x10634c8 0x1062b6e 0xaa16b8 0xaa1542 0x13432c4 0x1331480 0x1342228 0x1342082 0x13415b8 0x134144a 0x1334b01 0x1336ac8 0x133698c 0xb23a78 0xb23010 0xb3ae58 0xb3abaa 0x1792b5e 0x179241c 0x193ecc8 0x11ae3a6 0x16c7a24 0x112b9c3 0xe626de 0x7fab0f8bed90
ERROR: An internal error occurred (4100)

Looks like the bulkload job is failed to initialized. Can you help to check the existence of the job-manifest.txt file in the bulkdump folder? Also, can you help to check whether the input bulkload job ID of the bulkload command matches the bulkdump job ID (i.e. dataset ID) you want to load?

For your convenience, you may want to check our latest “quick start” doc in the PR: documentation/sphinx/source/bulkload-user.rst

Please let me know if you have further issues. Thanks!

Can you help to check the existence of the job-manifest.txt file in the bulkdump folder?

Yes, it is exist and IDs are same.

In “quick start”:

Start a sufficient number of SSs because too few can cause bulkload fail (In the above we started 8 SSs).

I made more then 1 SS and job finished successfull.

Thank you.

1 Like