Really excited to see this bulkload feature released!, I’m currently testing on a cluster with 4 SSD nodes and 8 processes(1stateless + 1transaction + 6storage).
fdb> status
Using cluster file `/etc/foundationdb/fdb.cluster'.
Configuration:
Redundancy mode - single
Storage engine - ssd-rocksdb-v1
Log engine - ssd-2
Encryption at-rest - disabled
Coordinators - 1
Usable Regions - 1
Cluster:
FoundationDB processes - 8
Zones - 1
Machines - 1
Memory availability - 30.9 GB per process on machine with least available
Fault Tolerance - 0 machines
Server time - 04/18/25 11:40:31
Data:
Replication health - Healthy
Moving data - 0.000 GB
Sum of key-value sizes - 24.493 GB
Disk space used - 8.888 GB
Operating space:
Storage server - 502.9 GB free on most full server
Log server - 505.9 GB free on most full server
Workload:
Read rate - 15 Hz
Write rate - 0 Hz
Transactions started - 6 Hz
Transactions committed - 0 Hz
Conflict rate - 0 Hz
Backup and DR:
Running backups - 0
Running DRs - 0
Client time: 04/18/25 11:40:31
I bulkdump existing files from FDB to local storage (approximately 15GB)。
then reloaded the data back into FDB using bulkload. The bulkload process took about 11 minutes.
Job fe0bcec23eec9d38955ebf36c07791ed submitted at 1744946248.635041 for range { begin= end=\xff }. The job has 858 tasks. The job ran for 11.368590 mins and exited with status Complete.
I’m wondering if this speed meets expectations? Also, are there any performance benchmarks available for this feature? Any other config suggestions for this feature? Thanks!
Thanks for your interest in BulkLoad/Dump. How many SSes are you using? What is the knob DD_BULKDUMP_PARALLELISM and DD_BULKLOAD_PARALLELISM and MANIFEST_COUNT_MAX_PER_BULKLOAD_TASK are you set in the execution?
Currently, you can tune the three knobs to maximize the bulkload/dump throughput given your cluster setting.
DD_BULKDUMP_PARALLELISM is the maximum number of parallel bulkdump tasks at any time. DD_BULKLOAD_PARALLELISM is the maximum number of parallel bulkload tasks at any time. You may want to increase the two knobs to fully leverage your cluster’s parallelism (aka. the number of SSes). Currently, the load balancing policy of bulk loading is not optimal, so I recommend you to set the parallelism knob to be larger than the number of SSes.
In bulkload mechanism, DD dispatches tasks to SSes in batching way. MANIFEST_COUNT_MAX_PER_BULKLOAD_TASK is the batch size. If you see DD is very busy but the SSes are not busy, we may consider to increase this knob value.
In general, to maximize the bulkload throughput, we want to increase the parallelism as high as possible so that all SSes are leveraged to load the data at any time. We also want DD not very busy so that this centralized role is not the bottleneck of the mechanism.
In my test, the number of SS count was 100 in a single DC with SQLite. I increased two parallelism knob to 1000 and the MANIFEST_COUNT_MAX_PER_BULKLOAD_TASK is 10.
Thank you for your guidance! Following your suggestions, I expanded my cluster size to 42 storage nodes (SS) and added three new configuration parameters :
The bulkload time for the original 15GB dataset has been reduced to 5 minutes . I then regenerated a 150GB dataset, which completed bulkloading in 10 minutes . This suggests that larger datasets may better utilize disk I/O performance.
Also, I’m considering converting raw data files (like CSV or TXT) directly into a bulkload-compatible format, bypassing FDB’s bulkdump step. This would allow me to directly use the bulkload for faster data initialization. Would this approach be viable? I’d appreciate any advice you might have. Thank you!"
Yes! We can use RocksDB SST file writer to create SST files. The key is to build a directory on S3 with the organization compatible with the bulkload mechanism.
In general, you can check definitions in foundationdb/fdbclient/include/fdbclient /BulkLoading.h to get the overview of the organization of the dataset folder.
The dataset folder is an UID::toString(). In the folder, there is a single global “job-manifest.txt” file. This file records file path to the data set file for each subrange. Note that the subrange should cover the entire key space. I will explain a bit for the empty sub-range later. For the file format, please check comments in BulkLoadJobManifestFileHeader::toString() and BulkLoadJobFileManifestEntry::toString().
The folder can contain multiple sub-folders. Each sub-folder name is an UID::toString(). Inside, the children folders are named as index, i.e. 0, 1, 2, 3,… For each child folder, there is a std::to_string(Version)-manifest.txt. For the manifest content format, please check BulkLoadManifest::toString(). In addition to the manifest file, there is a data file. The name of the data file is std::to_string(Version)-data.sst. You can fake a version as the name file, but please note that the value of Version of the same child folder should be same. The byte sample file is optional.
Note that if the sub-range is empty, you still need the manifest file but the data file omits. In the manifest file, the keyCount and the byte should be 0 for the empty subrange.
When you load this data set, you can simply input the faked global UID as the bulkload input.
Please let me know if you have more questions. Thanks!
If the raw data is unsorted, I need to sort all of it first (since SST files can’t have overlapping key ranges).
Then split the sorted data into chunks and use bulk loading to create SST/Manifest files.
My biggest hurdle right now is efficiently performing global sorting on extremely large files.
Also, I appreciate your trust! I’ll share my progress as I work on this. Once the code is stable, I’ll submit a PR.Thanks!
At top of head, for the huge files, you can do parallel merge sort.
Note that when creating SST files, you may want to make the bytes of each SST file relatively small. The bulkload creates FDB shards to load data. So, the resulting shard size is equal to the single SST bytes multiplies knob_manifest_count_max_per_bulkload_task. In my test, each SST file size is around 15MB. Since my knob_manifest_count_max_per_bulkload_task is 10, after bulkload, each shard size is about 150MB which does not trigger any extra shard-boundary-change data movement after the bulkload, given my default setting of FDB target shard size. A nice SST file size gives us flexibility to tune knob_manifest_count_max_per_bulkload_task and avoid those extra data movements.