BulkDump Failed

Hello every one,during the bulkdump process, I found that bulkdump got stuck.
I noticed one process reported an error: “Last logged error: SystemError: internal_error at Tue Aug 12 16:10:44 2025”.
I checked the logs and found the following error messages.

<Event Severity="40" ErrorKind="BugDetected" Time="1754985998.538776" DateTime="2025-08-12T08:06:38Z" Type="InternalError" ID="0000000000000000" Error="internal_error" ErrorDescription="An internal error occurred" ErrorCode="4100" FailedAssertion="res.second" File="/home/foundationdb_ci/src/oOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOo/foundationdb/fdbserver/storageserver.actor.cpp" Line="6069" ThreadID="4157295718276064864" Backtrace="addr2line -e fdbserver.debug -p -C -f -i 0x5a9ccaf 0x5a9cfa9 0x5a97104 0x59c392c 0x36457a9 0x3644bce 0x3645c18 0x3645a62 0x3640d38 0x3640a62 0x32d49f8 0x363198d 0x362f702 0x362d5c8 0x362cd23 0x362d275 0x362bba6 0x2bb9368 0x2be1255 0x5a21088 0x3551b97 0x7f36462f81b7" Machine="xx.xx.xx.xx:4514" LogGroup="default" Roles="SS" />
<Event Severity="40" ErrorKind="Unset" Time="1754985998.538776" DateTime="2025-08-12T08:06:38Z" Type="SystemError" ID="0000000000000000" Error="internal_error" ErrorDescription="An internal error occurred" ErrorCode="4100" ThreadID="4157295718276064864" Backtrace="addr2line -e fdbserver.debug -p -C -f -i 0x5a9ccaf 0x5a9cfa9 0x5a97104 0x59c3d70 0x59c393f 0x36457a9 0x3644bce 0x3645c18 0x3645a62 0x3640d38 0x3640a62 0x32d49f8 0x363198d 0x362f702 0x362d5c8 0x362cd23 0x362d275 0x362bba6 0x2bb9368 0x2be1255 0x5a21088 0x3551b97 0x7f36462f81b7" Machine="xx.xx.xx.xx:4514" LogGroup="default" Roles="SS" />
<Event Severity="20" Time="1754985998.538776" DateTime="2025-08-12T08:06:38Z" Type="SSBulkDumpError" ID="3a6fc13218dde3ff" Error="internal_error" ErrorDescription="An internal error occurred" ErrorCode="4100" TaskID="eeda558e1c730723" TaskRange="\x00\x05\x01\x00\x00\x00\x03\x00\x0f\x00\x00\x00s70369118225152\x00\x00\x00\x00\x0c\x00\x00\x00\x00 - \x00\x05\x01\x00\x00\x00\x03\x00\xff" JobID="8c0bd4584e85dc2e" RetryCount="1" BatchNum="0" ThreadID="4157295718276064864" Machine="xx.xx.xx.xx:4514" LogGroup="default" Roles="SS" />
<Event Severity="40" ErrorKind="BugDetected" Time="1754985999.578606" DateTime="2025-08-12T08:06:39Z" Type="InternalError" ID="0000000000000000" Error="internal_error" ErrorDescription="An internal error occurred" ErrorCode="4100" FailedAssertion="res.second" File="/home/foundationdb_ci/src/oOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOo/foundationdb/fdbserver/storageserver.actor.cpp" Line="6069" ThreadID="4157295718276064864" Backtrace="addr2line -e fdbserver.debug -p -C -f -i 0x5a9ccaf 0x5a9cfa9 0x5a97104 0x59c392c 0x36457a9 0x3644bce 0x3645c18 0x3645a62 0x3640d38 0x3640a62 0x32d49f8 0x363198d 0x362f702 0x362d5c8 0x362cd23 0x362d275 0x362bba6 0x2bb9368 0x2be1255 0x5a21088 0x3551b97 0x7f36462f81b7" Machine="xx.xx.xx.xx:4514" LogGroup="default" Roles="SS" />
<Event Severity="40" ErrorKind="Unset" Time="1754985999.578606" DateTime="2025-08-12T08:06:39Z" Type="SystemError" ID="0000000000000000" Error="internal_error" ErrorDescription="An internal error occurred" ErrorCode="4100" ThreadID="4157295718276064864" Backtrace="addr2line -e fdbserver.debug -p -C -f -i 0x5a9ccaf 0x5a9cfa9 0x5a97104 0x59c3d70 0x59c393f 0x36457a9 0x3644bce 0x3645c18 0x3645a62 0x3640d38 0x3640a62 0x32d49f8 0x363198d 0x362f702 0x362d5c8 0x362cd23 0x362d275 0x362bba6 0x2bb9368 0x2be1255 0x5a21088 0x3551b97 0x7f36462f81b7" Machine="xx.xx.xx.xx:4514" LogGroup="default" Roles="SS" />
<Event Severity="20" Time="1754985999.578606" DateTime="2025-08-12T08:06:39Z" Type="SSBulkDumpError" ID="3a6fc13218dde3ff" Error="internal_error" ErrorDescription="An internal error occurred" ErrorCode="4100" TaskID="eeda558e1c730723" TaskRange="\x00\x05\x01\x00\x00\x00\x03\x00\x0f\x00\x00\x00s70369118225152\x00\x00\x00\x00\x0c\x00\x00\x00\x00 - \x00\x05\x01\x00\x00\x00\x03\x00\xff" JobID="8c0bd4584e85dc2e" RetryCount="2" BatchNum="0" ThreadID="4157295718276064864" Machine="xx.xx.xx.xx:4514" LogGroup="default" Roles="SS" />
<Event Severity="40" ErrorKind="BugDetected" Time="1754986000.620680" DateTime="2025-08-12T08:06:40Z" Type="InternalError" ID="0000000000000000" Error="internal_error" ErrorDescription="An internal error occurred" ErrorCode="4100" FailedAssertion="res.second" File="/home/foundationdb_ci/src/oOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOo/foundationdb/fdbserver/storageserver.actor.cpp" Line="6069" ThreadID="4157295718276064864" Backtrace="addr2line -e fdbserver.debug -p -C -f -i 0x5a9ccaf 0x5a9cfa9 0x5a97104 0x59c392c 0x36457a9 0x3644bce 0x3645c18 0x3645a62 0x3640d38 0x3640a62 0x32d49f8 0x363198d 0x362f702 0x362d5c8 0x362cd23 0x362d275 0x362bba6 0x2bb9368 0x2be1255 0x5a21088 0x3551b97 0x7f36462f81b7" Machine="xx.xx.xx.xx:4514" LogGroup="default" Roles="SS" />
<Event Severity="40" ErrorKind="Unset" Time="1754986000.620680" DateTime="2025-08-12T08:06:40Z" Type="SystemError" ID="0000000000000000" Error="internal_error" ErrorDescription="An internal error occurred" ErrorCode="4100" ThreadID="4157295718276064864" Backtrace="addr2line -e fdbserver.debug -p -C -f -i 0x5a9ccaf 0x5a9cfa9 0x5a97104 0x59c3d70 0x59c393f 0x36457a9 0x3644bce 0x3645c18 0x3645a62 0x3640d38 0x3640a62 0x32d49f8 0x363198d 0x362f702 0x362d5c8 0x362cd23 0x362d275 0x362bba6 0x2bb9368 0x2be1255 0x5a21088 0x3551b97 0x7f36462f81b7" Machine="xx.xx.xx.xx:4514" LogGroup="default" Roles="SS" />
<Event Severity="20" Time="1754986000.620680" DateTime="2025-08-12T08:06:40Z" Type="SSBulkDumpError" ID="3a6fc13218dde3ff" Error="internal_error" ErrorDescription="An internal error occurred" ErrorCode="4100" TaskID="eeda558e1c730723" TaskRange="\x00\x05\x01\x00\x00\x00\x03\x00\x0f\x00\x00\x00s70369118225152\x00\x00\x00\x00\x0c\x00\x00\x00\x00 - \x00\x05\x01\x00\x00\x00\x03\x00\xff" JobID="8c0bd4584e85dc2e" RetryCount="3" BatchNum="0" ThreadID="4157295718276064864" Machine="xx.xx.xx.xx:4514" LogGroup="default" Roles="SS" />
<Event Severity="10" Time="1754986001.488503" DateTime="2025-08-12T08:06:41Z" Type="LocalConfigurationMetrics" ID="1c2373489b397cc1" Elapsed="5.00001" Snapshots="0 -1 0" ChangeRequestsFetched="0 -1 0" Mutations="0 -1 0" ThreadID="4157295718276064864" Machine="xx.xx.xx.xx:4514" LogGroup="default" Roles="SS" TrackLatestType="Original" />

and some log like this:

trace.xx.xx.xx.xx.4514.1754919473.0AWGtW.1.16.xml:9933:<Event Severity="30" Time="1754967055.597547" DateTime="2025-08-12T02:50:55Z" Type="FetchKeysTooLong" ID="0000000000000000" Duration="47400" Phase="1" Begin="\x00\x05\x01\x00\x00\x00\x05\x00\x0f\x00\x00\x00c65970941749072\x00\x00\x00\x00\x04\x00\x00\x00\x0a\x00\x00\x00ad600eac6a\x0303b-22df--" End="\x00\x05\x01\x00\x00\x00\x05\x00\x0f\x00\x00\x00c65970967742375\x00\x00\x00\x00\x04\x00\x00\x00\x0a\x00\x00\x00c58d2abe0e\x0375-70d8195" ThreadID="5391159093553932568" Machine="xx.xx.xx.xx:4514" LogGroup="default" Roles="SS" />
trace.xx.xx.xx.xx.4514.1754919473.0AWGtW.1.16.xml:10016:<Event Severity="30" Time="1754967066.732992" DateTime="2025-08-12T02:51:06Z" Type="FetchKeysTooLong" ID="0000000000000000" Duration="47400" Phase="0" Begin="\x00\x05\x01\x00\x00\x00\x04\x00\x0f\x00\x00\x00f35184394116375\x00\x00\x00\x00\x09\x00\x00\x00\x0a\x00\x00\x00ebef6ca8ba\x00" End="\x00\x05\x01\x00\x00\x00\x04\x00\x0f\x00\x00\x00f35184399377772\x00\x00\x00\x00\x09\x00\x00\x00\x0a\x00\x00\x00dc0c17373d\x00" ThreadID="5391159093553932568" Machine="xx.xx.xx.xx:4514" LogGroup="default" Roles="SS" />
trace.xx.xx.xx.xx.4514.1754919473.0AWGtW.1.16.xml:10132:<Event Severity="30" Time="1754967086.120719" DateTime="2025-08-12T02:51:26Z" Type="FetchKeysTooLong" ID="0000000000000000" Duration="47400" Phase="0" Begin="\x00\x05\x01\x00\x00\x00\x05\x00\x0f\x00\x00\x00p26388279457025\x00\x00\x00\x00\x1a\x00\x00\x00\x0f\x00\x00\x00c26388501782696\x033740242-a1" End="\x00\x05\x01\x00\x00\x00\x05\x00\x0f\x00\x00\x00p26388279559040\x00\x00\x00\x00\x1a\x00\x00\x00\x0f\x00\x00\x00c43980737499169\x03-1-5c85c-f" ThreadID="5391159093553932568" Machine="xx.xx.xx.xx:4514" LogGroup="default" Roles="SS" />
trace.xx.xx.xx.xx.4514.1754919473.0AWGtW.1.16.xml:10192:<Event Severity="30" Time="1754967096.086297" DateTime="2025-08-12T02:51:36Z" Type="FetchKeysTooLong" ID="0000000000000000" Duration="47400" Phase="0" Begin="\x00\x05\x01\x00\x00\x00\x05\x00\x0f\x00\x00\x00p13194139544258\x00\x00\x00\x00\x1a\x00\x00\x00\x0f\x00\x00\x00c13194349649072\x03488-e5dd-0" End="\x00\x05\x01\x00\x00\x00\x05\x00\x0f\x00\x00\x00p13194139609193\x00\x00\x00\x00\x1a\x00\x00\x00\x0f\x00\x00\x00c43980541495133\x03ec97616a27" ThreadID="5391159093553932568" Machine="xx.xx.xx.xx:4514" LogGroup="default" Roles="SS" />

Could anyone please help analyze, based on the logs, what might have caused this issue?
Additionally, I performed a query during the bulkdump process, and I’m not sure if this could have been the cause of the issue.
FDB version:7.4.1

It might be because some keys exceeded the 10KB limit when I generated the SST files, which were then imported using the bulkload method. Since bulkload bypasses the transaction system, the key length wasn’t checked, leading to the failure during bulk dump. I will verify this further.

1 Like

Thanks for the report.

Any chance of some more details on your setup at the time of crash? What OS? Where did you get the 7.4.1 from? What does your cluster profile look like? Can you try running ‘addr2line -e fdbserver.debug -p -C -f -i 0x5a9ccaf 0x5a9cfa9 0x5a97104 0x59c392c 0x36457a9 0x3644bce 0x3645c18 0x3645a62 0x3640d38 0x3640a62 0x32d49f8 0x363198d 0x362f702 0x362d5c8 0x362cd23 0x362d275 0x362bba6 0x2bb9368 0x2be1255 0x5a21088 0x3551b97 0x7f36462f81b7’ replacing ‘fdbserver.debug’ w/ path to your actual binary and paste in here what you get. (I tried locally but my backtrace seems off with mention of ActorFuzz). Are there core dumps? If so, can you get a backtrace from it?

The FetchKeysTooLong likely did not cause crash. They are just warnings.

Michael

1 Like

Thank you for your reply. After directly deleting the keys exceeding 10KB in length via the deleterange command in fdbcli , I successfully re-executed the bulkdump task. This suggests that the failure was likely caused by the bulkload importing data with excessively long keys.
OS:CentOS Linux release 8.2.2004
FDB package was downlaoded from github Releases · apple/foundationdb · GitHub

1 Like

Glad you figured it. I filed Bulkdump stuck because of FetchKeysTooLong · Issue #12303 · apple/foundationdb · GitHub to dig in on why FetchKeysTooLong WARNING makes bulkdump get stuck.

S

1 Like

Hi, Michael
my previous information might have been slightly misleading. The excessively long keys I deleted starts with \x00\x05\x01\x00\x00\x00\x03\x00N . These keys did not between begin and end range specified in the FetchKeysTooLong log. This suggests that the FetchKeysTooLong error may not be the decisive factor. However, by manually generating SST files with deliberately oversized keys and performing a bulkload , the issue should still be reproducible.

I will take a look. Thank you for the added detail.
Michael

Hello, I am looking into this issue. You create SST files manually and then bulkload and then bulkdump. Do I understand correctly?

Thanks!

Yes,due to my bug, some data with key lengths far exceeding 10k were mixed in, but they were successfully bulkloaded.

Thank you so much! The bulkload should also do the check before injecting the data to the database.