'ERROR: Out of memory' crashed from fdbserver

foundationdb version : 7.1.9
foundationdb cluster : only one single fdbserver on one machine
os : 5.15.0-46-generic #49~20.04.1-Ubuntu
MEM : total-58Gi

Situation is when i want to restart my fdbserver from an existing data-dir with below commands:

./bin/fdbserver --cluster-file /etc/foundationdb/fdb.cluster --datadir /var/lib/foundationdb/data/4500 --listen-address public --logdir /var/log/foundationdb --memory 0 --public-address auto:4500 --storage-memory 15GiB --cache-memory 20GiB

But the fdbserver failed to start with : ‘ERROR: Out of memory’ , and i watched the memory which used
by this fdbserver is only 7.4G, far from os memory and storage-memory which i set.

os-memory as below :
01:31:44 AM HKT kbmemfree   kbavail kbmemused  %memused kbbuffers  kbcached  kbcommit   %commit  kbactive   kbinact   kbdirty
01:31:45 AM HKT     22.6G     31.7G     26.1G     44.5%      1.4G      7.9G     57.7G     94.9%      6.8G     28.0G      9.8M
01:31:46 AM HKT     22.6G     31.7G     26.1G     44.5%      1.4G      7.9G     57.7G     94.9%      6.8G     28.0G      9.8M

fdbserver-memory before exit with ERROR:out of memory
 PID      USER    PR. NI   VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
3958902 root      20   0 7347972   6.7g  41604 t   0.0  11.3   0:26.63 fdbserver        01:29:18

The total data in existing dir as below:

root@dev-39:/var/lib/foundationdb/data/4500# du -sh *
28K     coordination-0.fdq
16K     coordination-1.fdq
100M    log2-V_6-b80bd5f327afc9307a421d78a11ab083.sqlite
108K    log2-V_6-b80bd5f327afc9307a421d78a11ab083.sqlite-wal
571M    logqueue-V_6-b80bd5f327afc9307a421d78a11ab083-0.fdq
481M    logqueue-V_6-b80bd5f327afc9307a421d78a11ab083-1.fdq
4.0K    processId
6.4G    storage-c0cf7e54a0b77d427e1a2b35f981bcc3-0.fdq
5.4G    storage-c0cf7e54a0b77d427e1a2b35f981bcc3-1.fdq

Trace log as below:

trace.127.0.0.1.4500.1663730645.rnq3OX.1.21.xml:<Event Severity="40" ErrorKind="Unset" Time="1663749598.772305" DateTime="2022-09-21T08:39:58Z" Type="OutOfMemory" ID="0000000000000000" Message="Out of memory" ThreadID="18256774105133719707" Backtrace="addr2line -e fdbserver.debug -p -C -f -i 0x4049a6a 0x4048f31 0x40490f7 0x400d16a 0x400d1a0 0x4360970 0x3f6a6d6 0x1b683da 0x1b74d2e 0x1b752c0 0x1b757d9 0x1c96ae4 0x1c96f31 0x1518760 0x1ca49cd 0x25e2876 0x25e3627 0x25e8809 0x3fd4988 0x101f853 0x29ce6a6ee083" Machine="127.0.0.1:4500" LogGroup="default" Roles="CC,CD,CP,DD,GP,MS,RK,RV,SS,TL" />
std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::basic_string(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) at /usr/include/c++/9/bits/basic_string.h:455
 (inlined by) std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > std::operator+<char, std::char_traits<char>, std::allocator<char> >(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, char const*) at /usr/include/c++/9/bits/basic_string.h:6063
 (inlined by) BaseTraceEvent::backtrace(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) at /home/gpadmin/pie-db/fdb-dpkg/foundationdb/flow/Trace.cpp:1216
std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_is_local() const at /usr/include/c++/9/bits/basic_string.h:226
 (inlined by) std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_dispose() at /usr/include/c++/9/bits/basic_string.h:235
 (inlined by) std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::~basic_string() at /usr/include/c++/9/bits/basic_string.h:662
 (inlined by) BaseTraceEvent::log() at /home/gpadmin/pie-db/fdb-dpkg/foundationdb/flow/Trace.cpp:1235
BaseTraceEvent::~BaseTraceEvent() at /home/gpadmin/pie-db/fdb-dpkg/foundationdb/flow/Trace.cpp:1275
criticalError at /home/gpadmin/pie-db/fdb-dpkg/foundationdb/flow/Trace.h:543
allocate(unsigned long, bool) at /home/gpadmin/pie-db/fdb-dpkg/foundationdb/flow/Platform.actor.cpp:2066
operator new(unsigned long) at ??:?
ArenaBlock::addUsed(int) at /home/gpadmin/pie-db/fdb-dpkg/foundationdb/flow/Arena.cpp:234 (discriminator 3)
 (inlined by) ArenaBlock::allocate(Reference<ArenaBlock>&, int) at /home/gpadmin/pie-db/fdb-dpkg/foundationdb/flow/Arena.cpp:297 (discriminator 3)
StringBuffer::alignReserve(int, int) at /home/gpadmin/pie-db/fdb-dpkg/foundationdb/fdbserver/DiskQueue.actor.cpp:95
DiskQueue::addEmptyPage() at /home/gpadmin/pie-db/fdb-dpkg/foundationdb/fdbserver/DiskQueue.actor.cpp:1137
DiskQueue::backPage() at /home/gpadmin/pie-db/fdb-dpkg/foundationdb/fdbserver/DiskQueue.actor.cpp:1545
 (inlined by) DiskQueue::push(StringRef) at /home/gpadmin/pie-db/fdb-dpkg/foundationdb/fdbserver/DiskQueue.actor.cpp:901
DiskQueue_PopUncommitted::push(StringRef) at /home/gpadmin/pie-db/fdb-dpkg/foundationdb/fdbserver/DiskQueue.actor.cpp:1600
KeyValueStoreMemory<IKeyValueContainer>::log_op(KeyValueStoreMemory<IKeyValueContainer>::OpType, StringRef, StringRef) at /home/gpadmin/pie-db/fdb-dpkg/foundationdb/fdbserver/KeyValueStoreMemory.actor.cpp:448
 (inlined by) KeyValueStoreMemory<IKeyValueContainer>::SnapshotActorState<KeyValueStoreMemory<IKeyValueContainer>::SnapshotActor>::a_body1cont1loopBody1cont1(Void const&, int) at /home/gpadmin/pie-db/fdb-dpkg/foundationdb/fdbserver/KeyValueStoreMemory.actor.cpp:800
ActorCallback<KeyValueStoreMemory<IKeyValueContainer>::SnapshotActor, 1, Void>::fire(Void const&) at /home/gpadmin/pie-db/fdb-dpkg/foundationdb/flow/flow.h:1319
void SAV<Void>::send<Void>(Void&&) at /home/gpadmin/pie-db/fdb-dpkg/foundationdb/flow/flow.h:659
 (inlined by) void Promise<Void>::send<Void>(Void&&) const at /home/gpadmin/pie-db/fdb-dpkg/foundationdb/flow/flow.h:906
 (inlined by) Notified<MetricHandle<ContinuousMetric<long> >, long>::set(long const&) at /home/gpadmin/pie-db/fdb-dpkg/foundationdb/fdbclient/Notified.h:70
KeyValueStoreMemory<IKeyValueContainer>::commit(bool) at /home/gpadmin/pie-db/fdb-dpkg/foundationdb/fdbserver/KeyValueStoreMemory.actor.cpp:181 (discriminator 4)
(anonymous namespace)::UpdateStorageActorState<(anonymous namespace)::UpdateStorageActor>::a_body1loopBody1cont11cont1(Void const&, int) [clone .isra.0] at /home/gpadmin/pie-db/fdb-dpkg/foundationdb/fdbserver/storageserver.actor.cpp:7403
(anonymous namespace)::UpdateStorageActorState<(anonymous namespace)::UpdateStorageActor>::a_body1loopBody1cont6loopBody1(int) at /home/gpadmin/pie-db/fdb-dpkg/foundationdb/build/fdbserver/storageserver.actor.g.cpp:29954
 (inlined by) a_body1loopBody1cont11 at /home/gpadmin/pie-db/fdb-dpkg/foundationdb/build/fdbserver/storageserver.actor.g.cpp:29696
 (inlined by) a_body1loopBody1cont6break1 at /home/gpadmin/pie-db/fdb-dpkg/foundationdb/build/fdbserver/storageserver.actor.g.cpp:29798
 (inlined by) a_body1loopBody1cont6loopBody1 at /home/gpadmin/pie-db/fdb-dpkg/foundationdb/build/fdbserver/storageserver.actor.g.cpp:29718
ActorCallback<(anonymous namespace)::UpdateStorageActor, 4, Void>::fire(Void const&) at /home/gpadmin/pie-db/fdb-dpkg/foundationdb/build/fdbserver/storageserver.actor.g.cpp:29708
 (inlined by) a_body1loopBody1cont6 at /home/gpadmin/pie-db/fdb-dpkg/foundationdb/build/fdbserver/storageserver.actor.g.cpp:29437
 (inlined by) a_body1loopBody1cont5break1 at /home/gpadmin/pie-db/fdb-dpkg/foundationdb/build/fdbserver/storageserver.actor.g.cpp:29483
 (inlined by) a_body1loopBody1cont5loopBody1cont3 at /home/gpadmin/pie-db/fdb-dpkg/foundationdb/build/fdbserver/storageserver.actor.g.cpp:29594
 (inlined by) a_body1loopBody1cont5loopBody1cont1when1 at /home/gpadmin/pie-db/fdb-dpkg/foundationdb/build/fdbserver/storageserver.actor.g.cpp:29614
 (inlined by) a_callback_fire at /home/gpadmin/pie-db/fdb-dpkg/foundationdb/build/fdbserver/storageserver.actor.g.cpp:29635
 (inlined by) fire at /home/gpadmin/pie-db/fdb-dpkg/foundationdb/flow/flow.h:1318
void SAV<Void>::send<Void>(Void&&) at /home/gpadmin/pie-db/fdb-dpkg/foundationdb/flow/flow.h:659
 (inlined by) void Promise<Void>::send<Void>(Void&&) const at /home/gpadmin/pie-db/fdb-dpkg/foundationdb/flow/flow.h:906
 (inlined by) N2::PromiseTask::operator()() at /home/gpadmin/pie-db/fdb-dpkg/foundationdb/flow/Net2.actor.cpp:1201
 (inlined by) N2::Net2::run() at /home/gpadmin/pie-db/fdb-dpkg/foundationdb/flow/Net2.actor.cpp:1548
main at /home/gpadmin/pie-db/fdb-dpkg/foundationdb/fdbserver/fdbserver.actor.cpp:2141 (discriminator 5)

Please, have a look at this thread, in case you see similarities, I have updated it with the actions helped us to solve it.

Get it. Thanks for your reply.

That thread relates to the redwood storage engine, but the author seams does not use it.

@ZhangHuiGui the reason that you have not specified the memory parameter in you /etc/foundationdb/foundationdb.conf. By default it equals to 8 GB, but both storage-memory and cache-memory are set to some greater values. So your configuration is not correct.

  1. storage-memory is used only for the memory storage engine; all other storage engines use cache-memory. You can discover your current storage engine with fdbcli --exec "status details"
  2. If your fdbserver plays all the roles (by default), the minimal acceptable value of memory is cache-memory + 6Gib or storage-memory + 6Gib
2 Likes