Foundationdb 6.2 - fdbserver going out of memory

@ajbeamon - Looking at the GetMagazineSample trace from 10.196.76.47 when the memory jumped from ~2.44 GB to ~6.8 GB. The below backtraces to be more common during this period.

TraceEvent::backtrace(std::string const&) at ??:?
 (inlined by) std::basic_string<char, std::char_traits<char>, std::allocator<char> >::basic_string(std::string const&) at /opt/rh/devtoolset-8/root/usr/include/c++/8/bits/basic_string.tcc:613
 (inlined by) std::basic_string<char, std::char_traits<char>, std::allocator<char> > std::operator+<char, std::char_traits<char>, std::allocator<char> >(std::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, char const*) at /opt/rh/devtoolset-8/root/usr/include/c++/8/bits/basic_string.h:5958
 (inlined by) TraceEvent::backtrace(std::string const&) at /opt/foundation/foundationdb/flow/Trace.cpp:938
std::string::_M_rep() const at /opt/rh/devtoolset-8/root/usr/include/c++/8/bits/basic_string.h:3313
 (inlined by) std::basic_string<char, std::char_traits<char>, std::allocator<char> >::~basic_string() at /opt/rh/devtoolset-8/root/usr/include/c++/8/bits/basic_string.h:3631
 (inlined by) FastAllocator<64>::getMagazine() at /opt/foundation/foundationdb/flow/FastAlloc.cpp:449
FastAllocator<64>::allocate() at /opt/foundation/foundationdb/flow/FastAlloc.cpp:272
IndexedSet<Standalone<StringRef>, long>::Node::Node<StringRef&, int>(StringRef&, int&&, IndexedSet<Standalone<StringRef>, long>::Node*) at /opt/foundation/foundationdb/./flow/serialize.h:748
 (inlined by) IndexedSet<Standalone<StringRef>, long>::iterator IndexedSet<Standalone<StringRef>, long>::insert<StringRef&, int>(StringRef&, int&&, bool) at /opt/foundation/foundationdb/./flow/IndexedSet.h:635
(anonymous namespace)::ApplyByteSampleResultActorState<(anonymous namespace)::ApplyByteSampleResultActor>::a_body1loopBody1cont1(Standalone<VectorRef<KeyValueRef, (VecSerStrategy)0> > const&, int) at /opt/foundation/foundationdb/./flow/serialize.h:748
Reference<ArenaBlock>::~Reference() at /opt/foundation/foundationdb/./flow/serialize.h:748
 (inlined by) Arena::~Arena() at /opt/foundation/foundationdb/./flow/Arena.h:92
 (inlined by) Standalone<VectorRef<KeyValueRef, (VecSerStrategy)0> >::~Standalone() at /opt/foundation/foundationdb/./flow/Arena.h:479
 (inlined by) fire at /opt/foundation/foundationdb/./flow/flow.h:998
void SAV<Standalone<VectorRef<KeyValueRef, (VecSerStrategy)0> > >::send<Standalone<VectorRef<KeyValueRef, (VecSerStrategy)0> >&>(Standalone<VectorRef<KeyValueRef, (VecSerStrategy)0> >&) at /opt/foundation/foundationdb/./flow/flow.h:529
 (inlined by) void Promise<Standalone<VectorRef<KeyValueRef, (VecSerStrategy)0> > >::send<Standalone<VectorRef<KeyValueRef, (VecSerStrategy)0> >&>(Standalone<VectorRef<KeyValueRef, (VecSerStrategy)0> >&) const at /opt/foundation/foundationdb/./flow/flow.h:778
 (inlined by) a_body1cont1 at /opt/foundation/foundationdb/flow/genericactors.actor.h:1202
 (inlined by) a_body1when1 at /opt/foundation/foundationdb/flow/genericactors.actor.g.h:11733
 (inlined by) a_callback_fire at /opt/foundation/foundationdb/flow/genericactors.actor.g.h:11747
 (inlined by) fire at /opt/foundation/foundationdb/./flow/flow.h:998
void SAV<Void>::send<Void>(Void&&) at /opt/boost_1_67_0/boost/variant/detail/forced_return.hpp:31
Promise<Void>::~Promise() at /opt/foundation/foundationdb/flow/TDMetric.actor.h:1269
 (inlined by) N2::PromiseTask::~PromiseTask() at /opt/foundation/foundationdb/flow/Net2.actor.cpp:475
 (inlined by) N2::PromiseTask::operator()() at /opt/foundation/foundationdb/flow/Net2.actor.cpp:482
 (inlined by) N2::Net2::run() at /opt/foundation/foundationdb/flow/Net2.actor.cpp:657
main at /opt/rh/devtoolset-8/root/usr/include/c++/8/bits/basic_string.h:3346
?? ??:0

I also see the below trace during this period which seems to be the fix you added in 6.2.19.

addr2line -e fdbserver.debug -p -C -f -i 0x194478f 0x190e413 0x190e651 0x186d545 0x1863dea 0x72b1c2 0x765327 0x76559a 0x7656f1 0x768701 0x768775 0x768858 0xb46cbf 0xb4ad97 0xa2ef82 0xa2c0f9 0x18500a6 0x7ff4a0c5f5d0
TraceEvent::backtrace(std::string const&) at ??:?
 (inlined by) std::basic_string<char, std::char_traits<char>, std::allocator<char> >::basic_string(std::string const&) at /opt/rh/devtoolset-8/root/usr/include/c++/8/bits/basic_string.tcc:613
 (inlined by) std::basic_string<char, std::char_traits<char>, std::allocator<char> > std::operator+<char, std::char_traits<char>, std::allocator<char> >(std::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, char const*) at /opt/rh/devtoolset-8/root/usr/include/c++/8/bits/basic_string.h:5958
 (inlined by) TraceEvent::backtrace(std::string const&) at /opt/foundation/foundationdb/flow/Trace.cpp:938
std::string::_M_rep() const at /opt/rh/devtoolset-8/root/usr/include/c++/8/bits/basic_string.h:3313
 (inlined by) std::basic_string<char, std::char_traits<char>, std::allocator<char> >::~basic_string() at /opt/rh/devtoolset-8/root/usr/include/c++/8/bits/basic_string.h:3631
 (inlined by) FastAllocator<4096>::getMagazine() at /opt/foundation/foundationdb/flow/FastAlloc.cpp:449
FastAllocator<4096>::allocate() at /opt/foundation/foundationdb/flow/FastAlloc.cpp:272
EvictablePageCache::allocate(EvictablePage*) at /opt/foundation/foundationdb/fdbrpc/AsyncFileCached.actor.h:76
 (inlined by) AFCPage::AFCPage(AsyncFileCached*, long) at /opt/foundation/foundationdb/fdbrpc/AsyncFileCached.actor.h:576
AsyncFileCached::readZeroCopy(void**, int*, long) at /opt/foundation/foundationdb/./flow/FastAlloc.h:200
asyncReadZeroCopy(sqlite3_file*, void**, int, long long, int*) at /opt/rh/devtoolset-8/root/usr/include/c++/8/bits/basic_string.h:3346
readDbPage at /opt/foundation/foundationdb/fdbserver/sqlite/sqlite3.amalgamation.c:31953
sqlite3PagerAcquire at /opt/foundation/foundationdb/fdbserver/sqlite/sqlite3.amalgamation.c:34182
btreeGetPage at /opt/foundation/foundationdb/fdbserver/sqlite/btree.c:1567
getAndInitPage at /opt/foundation/foundationdb/fdbserver/sqlite/btree.c:1619
moveToChild at /opt/foundation/foundationdb/fdbserver/sqlite/btree.c:4188
moveToLeftmost at /opt/foundation/foundationdb/fdbserver/sqlite/btree.c:4349
RawCursor::moveNext() at /opt/foundation/foundationdb/./flow/flow.h:529
 (inlined by) RawCursor::DefragmentingReader::advance() at /opt/foundation/foundationdb/fdbserver/KeyValueStoreSQLite.actor.cpp:945
 (inlined by) RawCursor::DefragmentingReader::peek() at /opt/foundation/foundationdb/fdbserver/KeyValueStoreSQLite.actor.cpp:957
 (inlined by) RawCursor::getRange(KeyRangeRef, int, int) at /opt/foundation/foundationdb/fdbserver/KeyValueStoreSQLite.actor.cpp:1094
KeyValueStoreSQLite::Reader::action(KeyValueStoreSQLite::Reader::ReadRangeAction&) at /opt/foundation/foundationdb/./flow/flow.h:529
 (inlined by) TypedAction<KeyValueStoreSQLite::Reader, KeyValueStoreSQLite::Reader::ReadRangeAction>::operator()(IThreadPoolReceiver*) at /opt/foundation/foundationdb/./flow/IThreadPool.h:74
yield(TaskPriority) at /opt/foundation/foundationdb/./flow/flow.h:1018
 (inlined by) WorkPool<Coroutine, ThreadUnsafeSpinLock, true>::Worker::run() at /opt/foundation/foundationdb/fdbserver/CoroFlow.actor.cpp:151
Coroutine::wrapRun() at /opt/foundation/foundationdb/./flow/flow.h:629
 (inlined by) Coroutine::entry(void*) at /opt/foundation/foundationdb/fdbserver/CoroFlow.actor.cpp:89
Coro_StartWithArg at /opt/foundation/foundationdb/fdbrpc/libcoroutine/Coro.c:277
?? ??:0

Are these pointing to the same issue or there is some other thing important which I am missing?

io_submit() is KAIO, which is the Linux Kernel’s async IO support. If io_submit() is blocking for 250ms-400ms for you. That would indicate that there’s something very strange going on between FDB and your hardware. What version of Linux are you running? What filesystem? Which Distro? What environment are you running FDB in? (Bare metal? Cloud? Which cloud?)

@alexmiller - Ubuntu 16.04.6 running on ESXi 6.5.

From the trace file it seems io_submit started taking time when the process memory usage increased more than 5.3 GB. I don’t see this when the memory usage of the process was less. Also the frequency of this event is less. Is there a possibility that Net2SlowTaskTrace (pointing to io_submit taking time) a side effect of the high memory usage on the process?