Segmentation fault error and broken cluster

Version 5.1.7 custom build with TSL plugin commit:

commit 9ad8d02386d4a6a5efecf898df80f2747695c627 (HEAD, tag: 5.1.7)

Configuration

  • 3 node cluster with double ssd configuration.
/usr/sbin/fdbserver --cluster_file /etc/foundationdb/fdb.cluster --datadir /var/lib/foundationdb/data/4500 --listen_address public --logdir /var/log/foundationdb --public_address auto:4500 --tls_certificate_file /etc/foundationdb/fdb.pem --tls_key_file /etc/foundationdb/fdb-key.pem --tls_plugin /usr/lib/foundationdb/plugins/libFDBLibTLS.so

Application

  • Golang application.

Cluster

Worked for about 10h under almost write load (30% CPU and 4GB memory). And suddenly failed with OOM errors in the logs (but there is a lot free memory). Started restarting infinitely without success:

...
<Event Severity="10" Time="1527763249.658725" Type="GenerationRegReadRequest" Machine="192.168.1.1:4500" ID="0000000000000000" From="192.168.1.2:4500:tls" K="fdb_test_cluster:6KSAuefFw2PtTjwhomsLUdG4wQPeV123" logGroup="default"/>
...
<Event Severity="40" Time="1527763249.937558" Type="OutOfMemory" Machine="192.168.1.1:4500" ID="0000000000000000" Message="Out of memory" logGroup="default" Backtrace="addr2line -e fdbserver.debug -p -C -f -i 0x136d55c 0x136c54a 0x1357019 0x135705c 0x13570d9 0x13239f5 0x13243b1 0x130a555 0x1307ee7 0x85ba6a 0x928187 0x9282db 0x930a52 0x629807 0x62b01f 0x6307e7 0x53fa52 0x53cf39 0x12b43f6 0x7fca32bbd000"/>

Strerr out

FDBD joined cluster.
ERROR: Out of memory

Stack traces

$ addr2line -e fdbserver.debug -p -C -f -i 0x136d55c 0x136c54a 0x1357019 0x135705c 0x13570d9 0x13239f5 0x13243b1 0x130a555 0x1307ee7 0x85ba6a 0x928187 0x9282db 0
x928410 0x92b4e6 0x92b576 0x9312e3 0x6288dc 0x62afed 0x6307e7 0x53fa52 0x53cf39 0x12b43f6 0x7fbd26b27000
operator+<char, std::char_traits<char>, std::allocator<char> > at /usr/include/c++/4.9/bits/basic_string.h:2424
 (inlined by) TraceEvent::backtrace(std::string) at /tmp/foundationdb/tmp/build/foundationdb/flow/Trace.cpp:871
std::string::_M_rep() const at /usr/include/c++/4.9/bits/basic_string.h:301
 (inlined by) ~basic_string at /usr/include/c++/4.9/bits/basic_string.h:547
 (inlined by) TraceEvent::~TraceEvent() at /tmp/foundationdb/tmp/build/foundationdb/flow/Trace.cpp:905
criticalError at /tmp/foundationdb/tmp/build/foundationdb/flow/Platform.cpp:2310
platform::outOfMemory() at /tmp/foundationdb/tmp/build/foundationdb/flow/Platform.cpp:2295
allocate(unsigned long, bool) at /tmp/foundationdb/tmp/build/foundationdb/flow/Platform.cpp:1511
FastAllocator<4096>::getMagazine() at /tmp/foundationdb/tmp/build/foundationdb/flow/FastAlloc.cpp:366
FastAllocator<4096>::allocate() at /tmp/foundationdb/tmp/build/foundationdb/flow/FastAlloc.cpp:221
EvictablePageCache::allocate(EvictablePage*) at /tmp/foundationdb/tmp/build/foundationdb/fdbrpc/AsyncFileCached.actor.h:72
 (inlined by) AFCPage::AFCPage(AsyncFileCached*, long) at /tmp/foundationdb/tmp/build/foundationdb/fdbrpc/AsyncFileCached.actor.h:448
insert<std::pair<long int, AFCPage*>, void> at /usr/include/c++/4.9/bits/hashtable_policy.h:911 (discriminator 3)
 (inlined by) insert<std::pair<long int, AFCPage*>, void> at /usr/include/c++/4.9/bits/unordered_map.h:400 (discriminator 3)
 (inlined by) AsyncFileCached::readZeroCopy(void**, int*, long) at /tmp/foundationdb/tmp/build/foundationdb/fdbrpc/AsyncFileCached.actor.cpp:153 (discriminator 3)
asyncReadZeroCopy at /tmp/foundationdb/tmp/build/foundationdb/fdbserver/VFSAsync.cpp:137
readDbPage at /tmp/foundationdb/tmp/build/foundationdb/fdbserver/sqlite/sqlite3.amalgamation.c:31950
sqlite3PagerAcquire at /tmp/foundationdb/tmp/build/foundationdb/fdbserver/sqlite/sqlite3.amalgamation.c:34179
btreeGetPage at /tmp/foundationdb/tmp/build/foundationdb/fdbserver/sqlite/btree.c:1567
getAndInitPage at /tmp/foundationdb/tmp/build/foundationdb/fdbserver/sqlite/btree.c:1620
moveToChild at /tmp/foundationdb/tmp/build/foundationdb/fdbserver/sqlite/btree.c:4188
sqlite3BtreeMovetoUnpacked at /tmp/foundationdb/tmp/build/foundationdb/fdbserver/sqlite/btree.c:4709
RawCursor::moveTo(StringRef, bool) at /tmp/foundationdb/tmp/build/foundationdb/fdbserver/KeyValueStoreSQLite.actor.cpp:1168
RawCursor::getRange(KeyRangeRef, int, int) at /tmp/foundationdb/tmp/build/foundationdb/fdbserver/KeyValueStoreSQLite.actor.cpp:1089
KeyValueStoreSQLite::Reader::action(KeyValueStoreSQLite::Reader::ReadRangeAction&) at /tmp/foundationdb/tmp/build/foundationdb/fdbserver/KeyValueStoreSQLite.actor.cpp:1546
 (inlined by) TypedAction<KeyValueStoreSQLite::Reader, KeyValueStoreSQLite::Reader::ReadRangeAction>::operator()(IThreadPoolReceiver*) at /tmp/foundationdb/tmp/build/foundationdb/./flow/IThreadPool.h:74
yield at /tmp/foundationdb/tmp/build/foundationdb/./flow/flow.h:948
 (inlined by) WorkPool<Coroutine, ThreadUnsafeSpinLock, true>::Worker::run() at /tmp/foundationdb/tmp/build/foundationdb/fdbserver/CoroFlow.actor.cpp:150
Coroutine::entry(void*) at /tmp/foundationdb/tmp/build/foundationdb/fdbserver/CoroFlow.actor.cpp:107
Coro_StartWithArg at /tmp/foundationdb/tmp/build/foundationdb/fdbrpc/libcoroutine/Coro.c:288
?? ??:0
addr2line -e fdbserver.debug -p -C -f -i 0x136d55c 0x136c54a 0x1357019 0x135705c 0x13570d9 0x1320f25 0x13218b1 0x4715a5 0x4777ba 0x136d233 0x136d58b 0x136c54a 0x1357019 0x135705c 0x13570d9 0x1320f25 0x13218b1 0x4715a5 0x4777ba 0x136d233 0x136d58b 0x136c54a 0x1357019 0x135705c 0x13570d9 0x1320f25 0x13218b1 0x4715a5 0x4777ba 0x136d233 0x136d58b 0x136c54a 0x1357019 0x135705c 0x13570d9 0x1320f25 0x13218b1 0x4715a5 0x5b3750 0x123f162 0x81859d 0x81a305 0x47a9e8 0xa2b925 0xa2ba62 0xa28558 0xa27c18
operator+<char, std::char_traits<char>, std::allocator<char> > at /usr/include/c++/4.9/bits/basic_string.h:2424
 (inlined by) TraceEvent::backtrace(std::string) at /tmp/foundationdb/tmp/build/foundationdb/flow/Trace.cpp:871
std::string::_M_rep() const at /usr/include/c++/4.9/bits/basic_string.h:301
 (inlined by) ~basic_string at /usr/include/c++/4.9/bits/basic_string.h:547
 (inlined by) TraceEvent::~TraceEvent() at /tmp/foundationdb/tmp/build/foundationdb/flow/Trace.cpp:905
criticalError at /tmp/foundationdb/tmp/build/foundationdb/flow/Platform.cpp:2310
platform::outOfMemory() at /tmp/foundationdb/tmp/build/foundationdb/flow/Platform.cpp:2295
allocate(unsigned long, bool) at /tmp/foundationdb/tmp/build/foundationdb/flow/Platform.cpp:1511
FastAllocator<512>::getMagazine() at /tmp/foundationdb/tmp/build/foundationdb/flow/FastAlloc.cpp:366
FastAllocator<512>::allocate() at /tmp/foundationdb/tmp/build/foundationdb/flow/FastAlloc.cpp:221
ArenaBlock::create(int, Reference<ArenaBlock>&) at /tmp/foundationdb/tmp/build/foundationdb/./flow/Arena.h:227 (discriminator 1)
Arena at /tmp/foundationdb/tmp/build/foundationdb/./flow/Arena.h:288
 (inlined by) Standalone<StringRef>::Standalone(StringRef const&) at /tmp/foundationdb/tmp/build/foundationdb/./flow/Arena.h:339
TraceEvent::detail(char const*, char const*) at /tmp/foundationdb/tmp/build/foundationdb/flow/Trace.cpp:721
std::string::_M_rep() const at /usr/include/c++/4.9/bits/basic_string.h:301
 (inlined by) ~basic_string at /usr/include/c++/4.9/bits/basic_string.h:547
 (inlined by) TraceEvent::backtrace(std::string) at /tmp/foundationdb/tmp/build/foundationdb/flow/Trace.cpp:871
std::string::_M_rep() const at /usr/include/c++/4.9/bits/basic_string.h:301
 (inlined by) ~basic_string at /usr/include/c++/4.9/bits/basic_string.h:547
 (inlined by) TraceEvent::~TraceEvent() at /tmp/foundationdb/tmp/build/foundationdb/flow/Trace.cpp:905
criticalError at /tmp/foundationdb/tmp/build/foundationdb/flow/Platform.cpp:2310
platform::outOfMemory() at /tmp/foundationdb/tmp/build/foundationdb/flow/Platform.cpp:2295
allocate(unsigned long, bool) at /tmp/foundationdb/tmp/build/foundationdb/flow/Platform.cpp:1511
FastAllocator<512>::getMagazine() at /tmp/foundationdb/tmp/build/foundationdb/flow/FastAlloc.cpp:366
FastAllocator<512>::allocate() at /tmp/foundationdb/tmp/build/foundationdb/flow/FastAlloc.cpp:221
ArenaBlock::create(int, Reference<ArenaBlock>&) at /tmp/foundationdb/tmp/build/foundationdb/./flow/Arena.h:227 (discriminator 1)
Arena at /tmp/foundationdb/tmp/build/foundationdb/./flow/Arena.h:288
 (inlined by) Standalone<StringRef>::Standalone(StringRef const&) at /tmp/foundationdb/tmp/build/foundationdb/./flow/Arena.h:339
TraceEvent::detail(char const*, char const*) at /tmp/foundationdb/tmp/build/foundationdb/flow/Trace.cpp:721
std::string::_M_rep() const at /usr/include/c++/4.9/bits/basic_string.h:301
 (inlined by) ~basic_string at /usr/include/c++/4.9/bits/basic_string.h:547
 (inlined by) TraceEvent::backtrace(std::string) at /tmp/foundationdb/tmp/build/foundationdb/flow/Trace.cpp:871
std::string::_M_rep() const at /usr/include/c++/4.9/bits/basic_string.h:301
 (inlined by) ~basic_string at /usr/include/c++/4.9/bits/basic_string.h:547
 (inlined by) TraceEvent::~TraceEvent() at /tmp/foundationdb/tmp/build/foundationdb/flow/Trace.cpp:905
criticalError at /tmp/foundationdb/tmp/build/foundationdb/flow/Platform.cpp:2310
platform::outOfMemory() at /tmp/foundationdb/tmp/build/foundationdb/flow/Platform.cpp:2295
allocate(unsigned long, bool) at /tmp/foundationdb/tmp/build/foundationdb/flow/Platform.cpp:1511
FastAllocator<512>::getMagazine() at /tmp/foundationdb/tmp/build/foundationdb/flow/FastAlloc.cpp:366
FastAllocator<512>::allocate() at /tmp/foundationdb/tmp/build/foundationdb/flow/FastAlloc.cpp:221
ArenaBlock::create(int, Reference<ArenaBlock>&) at /tmp/foundationdb/tmp/build/foundationdb/./flow/Arena.h:227 (discriminator 1)
Arena at /tmp/foundationdb/tmp/build/foundationdb/./flow/Arena.h:288
 (inlined by) Standalone<StringRef>::Standalone(StringRef const&) at /tmp/foundationdb/tmp/build/foundationdb/./flow/Arena.h:339
TraceEvent::detail(char const*, char const*) at /tmp/foundationdb/tmp/build/foundationdb/flow/Trace.cpp:721
std::string::_M_rep() const at /usr/include/c++/4.9/bits/basic_string.h:301
 (inlined by) ~basic_string at /usr/include/c++/4.9/bits/basic_string.h:547
 (inlined by) TraceEvent::backtrace(std::string) at /tmp/foundationdb/tmp/build/foundationdb/flow/Trace.cpp:871
std::string::_M_rep() const at /usr/include/c++/4.9/bits/basic_string.h:301
 (inlined by) ~basic_string at /usr/include/c++/4.9/bits/basic_string.h:547
 (inlined by) TraceEvent::~TraceEvent() at /tmp/foundationdb/tmp/build/foundationdb/flow/Trace.cpp:905
criticalError at /tmp/foundationdb/tmp/build/foundationdb/flow/Platform.cpp:2310
platform::outOfMemory() at /tmp/foundationdb/tmp/build/foundationdb/flow/Platform.cpp:2295
allocate(unsigned long, bool) at /tmp/foundationdb/tmp/build/foundationdb/flow/Platform.cpp:1511
FastAllocator<512>::getMagazine() at /tmp/foundationdb/tmp/build/foundationdb/flow/FastAlloc.cpp:366
FastAllocator<512>::allocate() at /tmp/foundationdb/tmp/build/foundationdb/flow/FastAlloc.cpp:221
ArenaBlock::create(int, Reference<ArenaBlock>&) at /tmp/foundationdb/tmp/build/foundationdb/./flow/Arena.h:227 (discriminator 1)
Arena::Arena(unsigned long) at /tmp/foundationdb/tmp/build/foundationdb/./flow/Arena.h:289
operator new [] at /tmp/foundationdb/tmp/build/foundationdb/./flow/Arena.h:312
 (inlined by) StringRef at /tmp/foundationdb/tmp/build/foundationdb/./flow/Arena.h:397
 (inlined by) Standalone at /tmp/foundationdb/tmp/build/foundationdb/./flow/Arena.h:339
 (inlined by) sendPacket at /tmp/foundationdb/tmp/build/foundationdb/fdbrpc/FlowTransport.actor.cpp:829
getReply<TLogRejoinRequest> at /tmp/foundationdb/tmp/build/foundationdb/./fdbrpc/fdbrpc.h:237
 (inlined by) a_body1loopBody1cont1 at /tmp/foundationdb/tmp/build/foundationdb/fdbserver/TLogServer.actor.cpp:1126
 (inlined by) a_body1loopBody1 at /tmp/foundationdb/tmp/build/foundationdb/fdbserver/TLogServer.actor.g.cpp:5850
a_body1loopHead1 at /tmp/foundationdb/tmp/build/foundationdb/fdbserver/TLogServer.actor.g.cpp:5805
 (inlined by) a_body1loopBody1cont6 at /tmp/foundationdb/tmp/build/foundationdb/fdbserver/TLogServer.actor.g.cpp:5975
 (inlined by) a_body1loopBody1cont7 at /tmp/foundationdb/tmp/build/foundationdb/fdbserver/TLogServer.actor.g.cpp:5981
 (inlined by) a_body1loopBody1cont8 at /tmp/foundationdb/tmp/build/foundationdb/fdbserver/TLogServer.actor.g.cpp:5987
 (inlined by) a_body1loopBody1cont1when2 at /tmp/foundationdb/tmp/build/foundationdb/fdbserver/TLogServer.actor.g.cpp:6007
 (inlined by) a_callback_fire at /tmp/foundationdb/tmp/build/foundationdb/fdbserver/TLogServer.actor.g.cpp:6048
 (inlined by) fire at /tmp/foundationdb/tmp/build/foundationdb/./flow/flow.h:928
void SAV<Void>::send<Void>(Void&&) at /tmp/foundationdb/tmp/build/foundationdb/./flow/flow.h:381
~Promise at /tmp/foundationdb/tmp/build/foundationdb/./flow/flow.h:720
 (inlined by) AsyncVar<ServerDBInfo>::setUnconditional(ServerDBInfo const&) at /tmp/foundationdb/tmp/build/foundationdb/flow/genericactors.actor.h:657
 (inlined by) AsyncVar<ServerDBInfo>::set(ServerDBInfo const&) at /tmp/foundationdb/tmp/build/foundationdb/flow/genericactors.actor.h:651
 (inlined by) a_body1loopBody1when1 at /tmp/foundationdb/tmp/build/foundationdb/fdbserver/worker.actor.cpp:471
fire at /tmp/foundationdb/tmp/build/foundationdb/./flow/flow.h:928 (discriminator 2)
SAV<ServerDBInfo>::finishSendAndDelPromiseRef() at /tmp/foundationdb/tmp/build/foundationdb/./flow/flow.h:413
 (inlined by) a_body1cont2 at /tmp/foundationdb/tmp/build/foundationdb/flow/genericactors.actor.g.h:10023
 (inlined by) a_body1when1 at /tmp/foundationdb/tmp/build/foundationdb/flow/genericactors.actor.g.h:10030
 (inlined by) a_callback_fire at /tmp/foundationdb/tmp/build/foundationdb/flow/genericactors.actor.g.h:10044
 (inlined by) fire at /tmp/foundationdb/tmp/build/foundationdb/./flow/flow.h:928
SAV<ServerDBInfo>::finishSendAndDelPromiseRef() at /tmp/foundationdb/tmp/build/foundationdb/./flow/flow.h:413
 (inlined by) a_body1cont2 at /tmp/foundationdb/tmp/build/foundationdb/fdbrpc/genericactors.actor.g.h:864
 (inlined by) a_body1when1 at /tmp/foundationdb/tmp/build/foundationdb/fdbrpc/genericactors.actor.g.h:871
 (inlined by) a_callback_fire at /tmp/foundationdb/tmp/build/foundationdb/fdbrpc/genericactors.actor.g.h:885
 (inlined by) fire at /tmp/foundationdb/tmp/build/foundationdb/./flow/flow.h:928
1 Like

I’ve got the same with version 5.2.2:

commit b6b6b882af2f8ed3f921b52deb72afa58b4f2ce6 (HEAD, tag: 5.2.2)

Maybe I’m doing something wrong in my application code, but it definitely should not lead to totally broken clusters and segmentation faults.

Trace:

addr2line -e fdbserver.debug -p -C -f -i 0x13c7f0c 0x13c6e9a 0x13b1629 0x13b166c 0x13b16e9 0x137e005 0x137e9c1 0x13ac33f 0x13ac41b 0x4e9ad1 0x4f6ee1 0x4fbe5f 0x4fc286 0x1298014 0x45c136 0x43c478 0x43c738 0x43c9f5 0x47b398 0x49edd5 0x443021 0x44c00e 0x4e3289 0x12988d0 0x1298b58 0x47b398 0x1399e13 0x42a627 0x7efd131872e1
TraceEvent::backtrace(std::string) at /usr/include/c++/4.9/bits/basic_string.h:2424
 (inlined by) TraceEvent::backtrace(std::string) at /tmp/foundationdb/tmp/build/foundationdb/flow/Trace.cpp:874
TraceEvent::~TraceEvent() at /usr/include/c++/4.9/bits/basic_string.h:301
 (inlined by) ?? at /usr/include/c++/4.9/bits/basic_string.h:547
 (inlined by) TraceEvent::~TraceEvent() at /tmp/foundationdb/tmp/build/foundationdb/flow/Trace.cpp:908
criticalError at /tmp/foundationdb/tmp/build/foundationdb/flow/Platform.cpp:2310
platform::outOfMemory() at /tmp/foundationdb/tmp/build/foundationdb/flow/Platform.cpp:2295
allocate(unsigned long, bool) at /tmp/foundationdb/tmp/build/foundationdb/flow/Platform.cpp:1511
FastAllocator<4096>::getMagazine() at /tmp/foundationdb/tmp/build/foundationdb/flow/FastAlloc.cpp:366
FastAllocator<4096>::allocate() at /tmp/foundationdb/tmp/build/foundationdb/flow/FastAlloc.cpp:221
PacketWriter::nextBuffer() at /tmp/foundationdb/tmp/build/foundationdb/flow/serialize.h:565
 (inlined by) PacketWriter::nextBuffer() at /tmp/foundationdb/tmp/build/foundationdb/flow/Net2Packet.cpp:59
PacketWriter::serializeBytesAcrossBoundary(void const*, int) at /tmp/foundationdb/tmp/build/foundationdb/flow/Net2Packet.cpp:45
void save<PacketWriter, ReplyPromise<Void> >(PacketWriter&, RequestStream<ReplyPromise<Void> > const&) at /tmp/foundationdb/tmp/build/foundationdb/./flow/network.h:95
 (inlined by) void save<PacketWriter, ReplyPromise<Void> >(PacketWriter&, RequestStream<ReplyPromise<Void> > const&) at /tmp/foundationdb/tmp/build/foundationdb/./fdbrpc/fdbrpc.h:355
operator&<PacketWriter, RequestStream<ReplyPromise<Void> > > at /tmp/foundationdb/tmp/build/foundationdb/./flow/serialize.h:66
 (inlined by) ?? at /tmp/foundationdb/tmp/build/foundationdb/fdbserver/TLogInterface.h:65
 (inlined by) ?? at /tmp/foundationdb/tmp/build/foundationdb/./flow/serialize.h:80
 (inlined by) save<PacketWriter, TLogInterface> at /tmp/foundationdb/tmp/build/foundationdb/./flow/serialize.h:87
 (inlined by) operator&<PacketWriter, TLogInterface> at /tmp/foundationdb/tmp/build/foundationdb/./flow/serialize.h:66
 (inlined by) ?? at /tmp/foundationdb/tmp/build/foundationdb/./flow/flow.h:170
 (inlined by) ?? at /tmp/foundationdb/tmp/build/foundationdb/./flow/serialize.h:80
 (inlined by) save<PacketWriter, Optional<TLogInterface> > at /tmp/foundationdb/tmp/build/foundationdb/./flow/serialize.h:87
 (inlined by) operator&<PacketWriter, Optional<TLogInterface> > at /tmp/foundationdb/tmp/build/foundationdb/./flow/serialize.h:66
 (inlined by) ?? at /tmp/foundationdb/tmp/build/foundationdb/fdbserver/LogSystemConfig.h:48
 (inlined by) ?? at /tmp/foundationdb/tmp/build/foundationdb/./flow/serialize.h:80
 (inlined by) save<PacketWriter, OptionalInterface<TLogInterface> > at /tmp/foundationdb/tmp/build/foundationdb/./flow/serialize.h:87
 (inlined by) operator<< <PacketWriter, OptionalInterface<TLogInterface> > at /tmp/foundationdb/tmp/build/foundationdb/./flow/serialize.h:54
 (inlined by) void save<PacketWriter, OptionalInterface<TLogInterface> >(PacketWriter&, std::vector<OptionalInterface<TLogInterface>, std::allocator<OptionalInterface<TLogInterface> > > const&) at /tmp/foundationdb/tmp/build/foundationdb/./flow/serialize.h:131
void ServerDBInfo::serialize<PacketWriter>(PacketWriter&) at /tmp/foundationdb/tmp/build/foundationdb/./flow/serialize.h:607
 (inlined by) ?? at /tmp/foundationdb/tmp/build/foundationdb/./flow/serialize.h:115
 (inlined by) save<PacketWriter, int> at /tmp/foundationdb/tmp/build/foundationdb/./flow/serialize.h:87
 (inlined by) operator&<PacketWriter, int> at /tmp/foundationdb/tmp/build/foundationdb/./flow/serialize.h:66
 (inlined by) ?? at /tmp/foundationdb/tmp/build/foundationdb/fdbserver/LogSystemConfig.h:103
 (inlined by) ?? at /tmp/foundationdb/tmp/build/foundationdb/./flow/serialize.h:80
 (inlined by) save<PacketWriter, OldTLogConf> at /tmp/foundationdb/tmp/build/foundationdb/./flow/serialize.h:87
 (inlined by) operator<< <PacketWriter, OldTLogConf> at /tmp/foundationdb/tmp/build/foundationdb/./flow/serialize.h:54
 (inlined by) save<PacketWriter, OldTLogConf> at /tmp/foundationdb/tmp/build/foundationdb/./flow/serialize.h:131
 (inlined by) operator&<PacketWriter, std::vector<OldTLogConf> > at /tmp/foundationdb/tmp/build/foundationdb/./flow/serialize.h:66
 (inlined by) ?? at /tmp/foundationdb/tmp/build/foundationdb/fdbserver/LogSystemConfig.h:205
 (inlined by) ?? at /tmp/foundationdb/tmp/build/foundationdb/./flow/serialize.h:80
 (inlined by) save<PacketWriter, LogSystemConfig> at /tmp/foundationdb/tmp/build/foundationdb/./flow/serialize.h:87
 (inlined by) operator&<PacketWriter, LogSystemConfig> at /tmp/foundationdb/tmp/build/foundationdb/./flow/serialize.h:66
 (inlined by) void ServerDBInfo::serialize<PacketWriter>(PacketWriter&) at /tmp/foundationdb/tmp/build/foundationdb/fdbserver/ServerDBInfo.h:55
MakeSerializeSource<SerializeBoolAnd<ServerDBInfo> >::serializePacketWriter(PacketWriter&) const at /tmp/foundationdb/tmp/build/foundationdb/./flow/serialize.h:81
 (inlined by) save<PacketWriter, ServerDBInfo> at /tmp/foundationdb/tmp/build/foundationdb/./flow/serialize.h:87
 (inlined by) operator<< <PacketWriter, ServerDBInfo> at /tmp/foundationdb/tmp/build/foundationdb/./flow/serialize.h:54
 (inlined by) ?? at /tmp/foundationdb/tmp/build/foundationdb/./flow/serialize.h:643
 (inlined by) MakeSerializeSource<SerializeBoolAnd<ServerDBInfo> >::serializePacketWriter(PacketWriter&) const at /tmp/foundationdb/tmp/build/foundationdb/./flow/serialize.h:627
sendPacket(TransportData*, ISerializeSource const&, Endpoint const&, bool) at /tmp/foundationdb/tmp/build/foundationdb/fdbrpc/FlowTransport.actor.cpp:877
ActorCallback<(anonymous namespace)::NetworkSenderActor<ServerDBInfo>, 0, ServerDBInfo>::fire(ServerDBInfo const&) at /tmp/foundationdb/tmp/build/foundationdb/./flow/flow.h:636
 (inlined by) ?? at /tmp/foundationdb/tmp/build/foundationdb/fdbrpc/genericactors.actor.h:96
 (inlined by) ?? at /tmp/foundationdb/tmp/build/foundationdb/fdbrpc/genericactors.actor.h:96
 (inlined by) ?? at /tmp/foundationdb/tmp/build/foundationdb/fdbrpc/genericactors.actor.h:96
 (inlined by) ?? at /tmp/foundationdb/tmp/build/foundationdb/fdbrpc/genericactors.actor.g.h:1013
 (inlined by) ?? at /tmp/foundationdb/tmp/build/foundationdb/fdbrpc/genericactors.actor.g.h:1085
 (inlined by) ?? at /tmp/foundationdb/tmp/build/foundationdb/fdbrpc/genericactors.actor.g.h:1040
 (inlined by) ?? at /tmp/foundationdb/tmp/build/foundationdb/fdbrpc/genericactors.actor.g.h:1046
 (inlined by) ?? at /tmp/foundationdb/tmp/build/foundationdb/fdbrpc/genericactors.actor.g.h:1060
 (inlined by) fire at /tmp/foundationdb/tmp/build/foundationdb/./flow/flow.h:928
(anonymous namespace)::ClusterGetServerInfoActorState<(anonymous namespace)::ClusterGetServerInfoActor>::a_body1break1(int) [clone .isra.1670] at /tmp/foundationdb/tmp/build/foundationdb/./flow/flow.h:381
 (inlined by) ?? at /tmp/foundationdb/tmp/build/foundationdb/./fdbrpc/fdbrpc.h:99
 (inlined by) ?? at /tmp/foundationdb/tmp/build/foundationdb/fdbserver/ClusterController.actor.cpp:943
 (inlined by) ?? at /tmp/foundationdb/tmp/build/foundationdb/fdbserver/ClusterController.actor.g.cpp:2149
(anonymous namespace)::ClusterGetServerInfoActorState<(anonymous namespace)::ClusterGetServerInfoActor>::a_body1loopBody1(int) at /tmp/foundationdb/tmp/build/foundationdb/fdbserver/ClusterController.actor.g.cpp:2145
ActorCallback<(anonymous namespace)::ClusterGetServerInfoActor, 0, Void>::fire(Void const&) at /tmp/foundationdb/tmp/build/foundationdb/fdbserver/ClusterController.actor.g.cpp:2114
 (inlined by) ?? at /tmp/foundationdb/tmp/build/foundationdb/fdbserver/ClusterController.actor.g.cpp:2161
 (inlined by) ?? at /tmp/foundationdb/tmp/build/foundationdb/fdbserver/ClusterController.actor.g.cpp:2167
 (inlined by) ?? at /tmp/foundationdb/tmp/build/foundationdb/fdbserver/ClusterController.actor.g.cpp:2188
 (inlined by) fire at /tmp/foundationdb/tmp/build/foundationdb/./flow/flow.h:928
void SAV<Void>::send<Void>(Void&&) at /tmp/foundationdb/tmp/build/foundationdb/./flow/flow.h:381
AsyncVar<ServerDBInfo>::set(ServerDBInfo const&) at /tmp/foundationdb/tmp/build/foundationdb/./flow/flow.h:720
 (inlined by) ?? at /tmp/foundationdb/tmp/build/foundationdb/flow/genericactors.actor.h:657
 (inlined by) AsyncVar<ServerDBInfo>::set(ServerDBInfo const&) at /tmp/foundationdb/tmp/build/foundationdb/flow/genericactors.actor.h:651
clusterRegisterMaster(ClusterControllerData*, RegisterMasterRequest const&) at /tmp/foundationdb/tmp/build/foundationdb/fdbserver/ClusterController.actor.cpp:1360
ActorSingleCallback<(anonymous namespace)::ClusterControllerCoreActor, 8, RegisterMasterRequest>::fire(RegisterMasterRequest const&) at /tmp/foundationdb/tmp/build/foundationdb/fdbserver/ClusterController.actor.cpp:1749
 (inlined by) ?? at /tmp/foundationdb/tmp/build/foundationdb/fdbserver/ClusterController.actor.g.cpp:7198
 (inlined by) fire at /tmp/foundationdb/tmp/build/foundationdb/./flow/flow.h:938
NetNotifiedQueue<RegisterMasterRequest>::receive(ArenaReader&) at /tmp/foundationdb/tmp/build/foundationdb/./flow/flow.h:532
 (inlined by) NetNotifiedQueue<RegisterMasterRequest>::receive(ArenaReader&) at /tmp/foundationdb/tmp/build/foundationdb/./fdbrpc/fdbrpc.h:204
(anonymous namespace)::DeliverActorState<(anonymous namespace)::DeliverActor>::a_body1cont1(int) [clone .isra.330] at /tmp/foundationdb/tmp/build/foundationdb/fdbrpc/FlowTransport.actor.cpp:443
ActorCallback<(anonymous namespace)::DeliverActor, 0, Void>::fire(Void const&) at /tmp/foundationdb/tmp/build/foundationdb/./flow/flow.h:929
void SAV<Void>::send<Void>(Void&&) at /tmp/foundationdb/tmp/build/foundationdb/./flow/flow.h:381
N2::Net2::run() at /tmp/foundationdb/tmp/build/foundationdb/flow/flow.h:720
 (inlined by) ?? at /tmp/foundationdb/tmp/build/foundationdb/flow/Net2.actor.cpp:467
 (inlined by) ?? at /tmp/foundationdb/tmp/build/foundationdb/flow/Net2.actor.cpp:474
 (inlined by) N2::Net2::run() at /tmp/foundationdb/tmp/build/foundationdb/flow/Net2.actor.cpp:628
main at /tmp/foundationdb/tmp/build/foundationdb/fdbserver/fdbserver.actor.cpp:1617 (discriminator 2)
?? ??:0
1 Like

To be clear, the errors posted above are not segfaults but rather the process intentionally shutting itself down when it reaches its memory limit (default 8 GB). So the question then is why are the processes using so much memory?

Based on the command you pasted that you’re using to run fdbserver, it looks like you are using the default memory limit, but do let me know if that’s not the case.

I am currently aware of a couple different ways that currently exist to run a server process out of memory (both of which are issues we are tracking). Do either of these match your situation?

  1. Repeatedly creating database objects (e.g. using one of the Open variants in Golang) leaks memory on the coordinators. For example, if you are opening a new database for each transaction, you will eventually run some of your server processes out of memory. See https://github.com/apple/foundationdb/issues/321.

  2. Aggressively high write loads can run the proxies out of memory. This is particularly likely if a heavy load is turned on suddenly, but I think may also be possible if you are trying to write with unbounded parallelism (i.e. if your writers aren’t waiting for the result of one write before starting another). It’s reasonable to have multiple clients writing simultaneously, but if your writers are never waiting for commits to complete, then there could end up being lots of commits in flight. If you are writing heavy volumes, you could try turning down the rate at which you try to write (and/or bounding your parallelism) to see if that helps. Note that if your load is saturating the cluster, you’ll be able to do this without sacrificing throughput. See https://github.com/apple/foundationdb/issues/370 for the issue regarding this memory problem.

It is also of course possible that you’ve encountered some other way to run the processes out of memory. If neither of the above seem relevant to you, we should try to investigate what’s going on. A good starting point would be a more detailed description of what kind of workload you are running, and it may be helpful to see the trace logs from the failing processes as well.

Thank you for answer,

It crashes near 4GB of memory.

Yes, default settings for memory.

No, I open database only once.

In my case I use 1 goroutine (thread) for writes with a lock. But this one makes a lot of writes.

As I mentioned above, I make writes using only one goroutine and wait for a commit. I have small writes about 10KB. Rate is around 10 writes per second. But my SSDs are not fancy new and a bit slow. So I see a some retryable errors: 1007, 1021 and 1009. Memory usage was near 4GB per process. Don’t use any options for transactions.

My main concern is that after OOM (or memory leak or segfault), cluster can’t heal itself and I got a fully broken storage, can’t even access my data. It’s trying to restart processes without luck near 4GB of RAM.

Full log of a process with the OOM: https://raw.githubusercontent.com/brk0v/logs/master/trace.192.168.001.100.4501.1528107868.HvCax5.6.xml

I’ll grep and send you more logs if you need ones.

Thank you!

I added memory for each cluster node.

So now I have:

/usr/sbin/fdbserver --cluster_file /etc/foundationdb/fdb.cluster --datadir /var/lib/foundationdb/data/4500 --listen_address public --logdir /var/log/foundationdb --memory 10GiB --public_address auto:4500 --storage_memory 2GiB --tls_certificate_file /etc/foundationdb/fdb.pem --tls_key_file /etc/foundationdb/fdb-key.pem --tls_plugin /usr/lib/foundationdb/plugins/libFDBLibTLS.so

Restart my 3 node cluster and it has been recovering for 2 hours consuming memory and CPU but I can’t see any progress…

Memory usage:

 5383 foundat+  20   0 9762960 5.228g  21052 R  99.7  4.2 106:10.76 fdbserver
 5382 foundat+  20   0 8476424 4.051g  20900 S  56.3  3.2  91:55.13 fdbserver

I have 2 instances on each of servers with ports 4500 and 4501.

$ fdbcli:

fdb> status details

WARNING: Long delay (Ctrl-C to interrupt)

Using cluster file `/etc/foundationdb/fdb.cluster'.

Initializing new transaction servers and recovering transaction logs.

The cluster has some unreachable processes.

$ cat /etc/foundationdb/fdb.cluster

# DO NOT EDIT!
# This file is auto-generated, it is not to be edited by hand
test_cluster:6KSAuefFw2PtTjwhomsLUdG4wQPeVLmY@192.168.1.29:4500:tls,192.168.1.76:4500:tls,192.168.1.78:4500:tls

Found an interesting log error:

<Event Severity="40" Time="1528068636.898064" Type="SharedTLogFailed" Machine="192.168.1.100:4501" ID="87e809b0ea61cee3" Reason="Error" Error="io_error" ErrorDescription="Disk i/o operation failed" ErrorCode="1510" logGroup="default" Backtrace="addr2line -e fdbserver.debug -p -C -f -i 0x13c7f0c 0x13c6e9a 0xa26332 0xa26c86 0xa26e64 0xa182b3 0xa1ddc6 0xa343fb 0xa34bb7 0x4e2875 0x12988d0 0x1298b58 0x47b398 0x1399e13 0x42a627 0x7fe958ccb2e1"/>
TraceEvent::backtrace(std::string) at /usr/include/c++/4.9/bits/basic_string.h:2424
 (inlined by) TraceEvent::backtrace(std::string) at /tmp/foundationdb/tmp/build/foundationdb/flow/Trace.cpp:874
TraceEvent::~TraceEvent() at /usr/include/c++/4.9/bits/basic_string.h:301
 (inlined by) ?? at /usr/include/c++/4.9/bits/basic_string.h:547
 (inlined by) TraceEvent::~TraceEvent() at /tmp/foundationdb/tmp/build/foundationdb/flow/Trace.cpp:908
endRole(UID, std::string, std::string, bool, Error) at /usr/include/c++/4.9/bits/basic_string.h:301
 (inlined by) ?? at /usr/include/c++/4.9/bits/basic_string.h:547
 (inlined by) endRole(UID, std::string, std::string, bool, Error) at /tmp/foundationdb/tmp/build/foundationdb/fdbserver/worker.actor.cpp:424
(anonymous namespace)::WorkerHandleErrorsActorState<(anonymous namespace)::WorkerHandleErrorsActor>::a_body1loopBody1when1(ErrorInfo const&, int) at /usr/include/c++/4.9/bits/basic_string.h:301
 (inlined by) ?? at /usr/include/c++/4.9/bits/basic_string.h:547
 (inlined by) a_body1loopBody1when1 at /tmp/foundationdb/tmp/build/foundationdb/fdbserver/worker.actor.cpp:145
ActorSingleCallback<(anonymous namespace)::WorkerHandleErrorsActor, 0, ErrorInfo>::fire(ErrorInfo const&) at /tmp/foundationdb/tmp/build/foundationdb/./flow/flow.h:939
(anonymous namespace)::ForwardErrorActorState<(anonymous namespace)::ForwardErrorActor>::a_body1Catch2(Error const&, int) [clone .isra.1119] at /tmp/foundationdb/tmp/build/foundationdb/./flow/flow.h:532
 (inlined by) ?? at /tmp/foundationdb/tmp/build/foundationdb/./flow/flow.h:837
 (inlined by) ?? at /tmp/foundationdb/tmp/build/foundationdb/fdbserver/worker.actor.cpp:108
forwardError(PromiseStream<ErrorInfo> const&, char const* const&, UID const&, Future<Void> const&) at /tmp/foundationdb/tmp/build/foundationdb/./flow/flow.h:636
 (inlined by) ?? at /tmp/foundationdb/tmp/build/foundationdb/./flow/flow.h:693
 (inlined by) ?? at /tmp/foundationdb/tmp/build/foundationdb/fdbserver/worker.actor.g.cpp:410
 (inlined by) forwardError(PromiseStream<ErrorInfo> const&, char const* const&, UID const&, Future<Void> const&) at /tmp/foundationdb/tmp/build/foundationdb/fdbserver/worker.actor.cpp:99
(anonymous namespace)::WorkerServerActorState<(anonymous namespace)::WorkerServerActor>::a_body1cont12cont3loopBody1when4(InitializeTLogRequest const&, int) at /tmp/foundationdb/tmp/build/foundationdb/./flow/flow.h:837 (discriminator 2)
 (inlined by) ?? at /tmp/foundationdb/tmp/build/foundationdb/./flow/ActorCollection.h:61 (discriminator 2)
 (inlined by) a_body1cont12cont3loopBody1when4 at /tmp/foundationdb/tmp/build/foundationdb/fdbserver/worker.actor.cpp:729 (discriminator 2)
ActorSingleCallback<(anonymous namespace)::WorkerServerActor, 10, InitializeTLogRequest>::fire(InitializeTLogRequest const&) at /tmp/foundationdb/tmp/build/foundationdb/./fdbrpc/fdbrpc.h:110
 (inlined by) ?? at /tmp/foundationdb/tmp/build/foundationdb/fdbserver/WorkerInterface.h:69
 (inlined by) fire at /tmp/foundationdb/tmp/build/foundationdb/./flow/flow.h:938
NetNotifiedQueue<InitializeTLogRequest>::receive(ArenaReader&) at /tmp/foundationdb/tmp/build/foundationdb/./flow/flow.h:532
 (inlined by) NetNotifiedQueue<InitializeTLogRequest>::receive(ArenaReader&) at /tmp/foundationdb/tmp/build/foundationdb/./fdbrpc/fdbrpc.h:204
(anonymous namespace)::DeliverActorState<(anonymous namespace)::DeliverActor>::a_body1cont1(int) [clone .isra.330] at /tmp/foundationdb/tmp/build/foundationdb/fdbrpc/FlowTransport.actor.cpp:443
ActorCallback<(anonymous namespace)::DeliverActor, 0, Void>::fire(Void const&) at /tmp/foundationdb/tmp/build/foundationdb/./flow/flow.h:929
void SAV<Void>::send<Void>(Void&&) at /tmp/foundationdb/tmp/build/foundationdb/./flow/flow.h:381
N2::Net2::run() at /tmp/foundationdb/tmp/build/foundationdb/flow/flow.h:720
 (inlined by) ?? at /tmp/foundationdb/tmp/build/foundationdb/flow/Net2.actor.cpp:467
 (inlined by) ?? at /tmp/foundationdb/tmp/build/foundationdb/flow/Net2.actor.cpp:474
 (inlined by) N2::Net2::run() at /tmp/foundationdb/tmp/build/foundationdb/flow/Net2.actor.cpp:628
main at /tmp/foundationdb/tmp/build/foundationdb/fdbserver/fdbserver.actor.cpp:1617 (discriminator 2)"/>
.

Other strange thing is a size of data dir. My data should be around 40GB but, on disk I have log file with 113 GB:

drwxr-xr-x 2 foundationdb foundationdb 4.0K Jun  1 01:21 .
drwxr-xr-x 4 foundationdb foundationdb   30 May 31 14:31 ..
-rw---S--- 1 foundationdb foundationdb 9.7M Jun  4 15:57 coordination-0.fdq
-rw---S--- 1 foundationdb foundationdb  15M Jun  4 15:57 coordination-1.fdq
-rw---S--- 1 foundationdb foundationdb 113G Jun  4 15:57 log-435dc58687efd6f096a1b9a63a37da1b.sqlite <--------- 113 GB
-rw---S--- 1 foundationdb foundationdb 117M Jun  4 15:57 log-435dc58687efd6f096a1b9a63a37da1b.sqlite-wal
-rw---S--- 1 foundationdb foundationdb 1.6G Jun  4 15:57 logqueue-435dc58687efd6f096a1b9a63a37da1b-0.fdq
-rw---S--- 1 foundationdb foundationdb 1.6G Jun  4 11:38 logqueue-435dc58687efd6f096a1b9a63a37da1b-1.fdq
-rw---S--- 1 foundationdb foundationdb 4.0K May 31 14:27 processId
-rw---S--- 1 foundationdb foundationdb  12G Jun  4 15:59 storage-ca2be1f34b74669e2a00a347290db311.sqlite
-rw---S--- 1 foundationdb foundationdb  28M Jun  4 15:59 storage-ca2be1f34b74669e2a00a347290db311.sqlite-wal

It looks like the resident memory is about 4GB, but the virtual memory for the process (which is the number being limited) is 8GB. See this event from your log file just before the process died:

... Type="ProcessMetrics" ... Memory="8589721600" ResidentMemory="4228780032" ...

Is the disk that your server processes are using functioning correctly? An error like this usually indicates some problem using the disk.

A correctly configured cluster should be able to keep running with the loss of any process in the cluster. More generally, in double replication you should be able to set it up so that you could lose any set of processes in the same fault domain (e.g. you could lose one machine, or one rack, etc. depending on your configuration). I don’t see anything immediately wrong with your configuration from the details you’ve posted, so something may be going wrong with the recovery process.

How many of your processes are running out of memory and/or having problems with IO errors? Is this happening on more than one node? If you are having memory problems on more than one process, can you check whether this large memory usage lines up with the large log-* files you reported?

The files that are large here are the transaction log’s files. These can get large if there is some storage server in the cluster that either gets disconnected from the cluster or is not able to keep up with the write rate. In these cases, the transaction log has to store all of your mutations on disk until the storage server can fetch them and make them durable. If the storage server is completely dead, this data should get deleted once it’s been replicated elsewhere. If the storage server is alive but falling behind, then I think this file can grow indefinitely if one storage server is having a harder time keeping up than others.

If you are in fact saturating the cluster (you can tell by running status in fdbcli and looking to see if there is a ‘Performance limited by’ section), I’d recommend turning down your write rate to avoid doing so. As mentioned before, if the cluster is saturated, then you’ll be able to reduce your write rate to some extent without affecting throughput.

I have 2 process on each node, and with default settings I saw OOMs in the logs on all instances.

So this file is a WAL for a storage server, isn’t it? And all writes go through this file (append?), and then apply to the main sqllite db?

Yes, but the thing is I have a limited time window, so I need to import data as fast as I can.
And for doing this I need to start over with a bigger memory limit and smaller number of fdbserver processes for saving disk space for transaction logs, if I understand right how fdb works with data.

On all 6 instances? Are there some of these instances that don’t have large log-* files? Also, I’d like more details about the disk errors you are encountering.

Not exactly. The storage servers do actually have a WAL in the ssd storage engine (e.g. see your file storage-ca2be1f34b74669e2a00a347290db311.sqlite-wal). The transaction logs are part of the cluster’s transaction authority and are the component that make data durable during a commit. They do typically write data to an append-only log, and there is an asynchronous process that moves data from the logs to the storage servers. The logs typically serve data to the storage servers from memory rather than the disk files, however, and the logqueue-* files on disk are only used for recovery. There is also no particular relation in the counts of logs and storage servers in a cluster.

The files that are getting large in your case are not the append-only files, though. Instead, when a log has lots of data that hasn’t been read by some storage server(s), it eventually must flush it from memory into another on disk data structure (in this case, a B-tree).

If you are saturating the cluster (i.e. trying to write more than its max throughput), then you can turn down the write rate without impacting actual throughput. For example, if the cluster is only able to take 10MB/s of writes and you are trying to write 20MB/s, then the actual throughput is only going to be 10MB/s. You can reduce how much you are trying to write to 10MB/s and you’ll still be writing 10MB/s, but now you’ll also improve your latencies and potentially reduce the stress being applied to various parts of the cluster. Normally, we would recommend running a cluster below a saturating level, though I can understand wanting to get maximum throughput for a bulk-load situation when the cluster is not otherwise in use.

Normally you shouldn’t need a bigger memory limit. Without knowing why the processes are failing, it’s hard for me to say whether or not that will help. It’s not clear yet why the process is using so much memory and how much it’s actually going to use. However, if it is using excessive memory that sounds like a bug and it’s something we’ll want to address.

EDIT: Sorry, I got interrupted before I could finish this thought.

You shouldn’t need to reduce the number of storage processes in your cluster. The total storage space used by the storage servers shouldn’t really change much with the number of storage instances, and if you are saturating and reduce your write rate, it can reduce the amount of disk space used by the logs. One thing you can do (and that we recommend for production deployments) is to separate the logs and storage servers onto separate disks. You would do this by setting up some processes in your cluster to run on separate disks and configure their process class (e.g. by setting class=transaction in foundationdb.conf). This is primarily beneficial not for space reasons, but for performance. The log processes fsync frequently, and this can impact the performance of reads from the storage servers. Granted this matters less while you are running a write-only workload, but it’s good general advice.

2 machines have OutOfMemory errors in the logs for each port (in my case 4500 and 4501). And by the way I see these errors in today’s logs.
1 server doesn’t have OOM errors in logs, but today it’s run out of disk space without any write load. Other strange thing is a directory list for this server data dir:

$ ls -la /var/lib/foundationdb/data/4501

A lot of 0 size files and second big log file.

Is this B-tree file used for answering to requests? If I send get command, will I get a value for key from this log-file-data-structure?

For getting my saturating level I should use status in fdbcli and read the status of the storage, right? There is no other benchmark tools?
From golang program I don’t see any ways to understand whether my cluster is already saturated or not.

UPDATE: found a key for reading status: \xff\xff/status/json

Yes, from my point of view there is a bug. Looks like FoundationDB has a memory leak in saturated
circumstances.

Other thing that I want to emphasise is that API lets you set data with a bigger rate, that it could handle. From producer’s side I successfully set all my keys/values. But after a while I realized that cluster is saturated, can’t rebalance data and infinitely restarting with internal OOM breaking things and using additional disk space.

But anyway I need storage server on this server, right? It’s not clear this sentence about classes. How many of them should I use in a case of 3 node cluster and double ssd storage?

What about the disk IO errors? I’d like to get an idea what the IO errors were to form a more complete picture of what happened.

Are there any Severity="40" events in this process’s logs? The space is probably being used trying to recover the cluster, not because it’s taking writes, so we’d want to figure out why recovery is failing. You say this host isn’t having memory issues, is it having disk issues?

No, reads are done from the storage servers. The storage servers get their data from the logs, and typically this data is served straight out of memory. If we have to spill some data from memory onto disk (into this B-tree for the ssd storage engine), then the log will serve the data out of the B-tree instead.

At the moment, my recommendation is to test your cluster and measure empirically how much it’s able to get done running your workload. You can do this either measuring how much work you are doing in your client or by looking at some of the status metrics. Once you have a good idea of this, I would then run at some level below the saturation point (for a bulk-write scenario where no one else is using the database, maybe a high percentage of saturation, like 80%, and in other cases maybe 50% or less). If you instead your goal is to achieve a certain amount of throughput, then you can increase the size of your cluster (and/or configure more logs, proxies, or resolvers) until that throughput can be comfortably achieved. I think trying to read the status output in your program and adapt your workload to it might prove a bit tricky to make work well in general.

There is a proposed feature that will make it easier to set up what we call batch workloads to run at higher rates. These run at a lower priority and the goal is to make them automatically shut off at a point lower than saturation. See Batch priority enhancements ¡ Issue #377 ¡ apple/foundationdb ¡ GitHub.

It’s not clear exactly what the problem is here. For example, rather than a memory leak, it may be that it takes a lot of memory to recover from large log files. It’s also not yet clear if the disk IO errors factor into this at all. At any rate, as I mentioned, this is not the intended behavior.

The cluster uses something we call Ratekeeper to limit how much work you are doing, and it does so by introducing latency at the start of a transaction (specifically, when getting a read version). There are some ways you can saturate the cluster that Ratekeeper isn’t aware of (e.g. by having too much work for the proxies), and there are also some ways in which Ratekeeper tries to balance multiple goals.

For example, we allow the storage servers in one fault domain to fall arbitrarily far behind if they can’t keep up. Ratekeeper could choose to reduce the number of transactions it is willing to start to prevent these storage servers from falling behind, but then we could run into problems with a single slow storage server impacting the performance of an entire cluster. Normally, we’ve seen the current behavior work pretty well because of the way load gets balanced in the cluster. However, it may be that in very small clusters some processes can experience significantly more load than others and then be allowed to fall behind, requiring more disk space on the logs.

We’ve established that there is likely some bug here that causes high memory usage in some scenario that we haven’t quite identified, so I’m not sure I would describe this as the API letting you do this so much as it being a situation where things aren’t working as intended.

It is not required to have storage servers on every node. By default, a cluster has a number of logs equal to its replication, so in double it would have 2 logs. If you are going to go the route of configuring some processes to act specifically as logs, then I would have at least 3 processes configured as transaction class so that you have a backup if one fails. In triple replication, I would have 5 (3 + 2 backup). It’s also possible to configure your cluster to run more logs to support a higher write throughput (e.g. using configure logs=8 in fdbcli), in which case you would want to have that many transaction class processes, plus some backups.

Now I can see only “No space left on device” errors. But I think, it’s not a cause. My data set is about 40GB, as I mentioned before.

<Event Severity="10" Time="1528154524.618570" Type="AsyncFileKAIOOpenFailed" Machine="129.168.1.100:4500" ID="0000000000000000" Filename="/var/lib/foundationdb/data/4500/logqueue-8b2b67814ccc537bc3635f6bae188ff4-0.fdq" Flags="f0006" OSFlags="4242" mode="02600" Error="io_error" ErrorDescription="Disk i/o operation failed" ErrorCode="1510" UnixErrorCode="1c" UnixError="No space left on device" logGroup="default"/>

I have logs with Severity="40" but they all after above space issues.

I’m going to update setting for holding more logs and restart my load process to reproduce the issue. I’ve done it before with 5.1 version.

I don’t see any errors in logs and dmesg.

Mmm… Sorry for architecture question, I’ve read https://github.com/apple/foundationdb/blob/master/documentation/sphinx/source/kv-architecture.rst, but still want to clarify commit path and your comment.

So with a slow disk and/or high write rate, I can get in a situation, where I successfully set a key/value pair into a saturated cluster. But can’t get key from the storage servers in the next transaction? Or the first transaction returns no errors only if the storage engines already got all changes?

Interesting, thank you for the link.

As I mentioned before, I’ll achieve more logs during next try of load.

So if I configure some process with a log class, these processes could not do any other work and could not be promoted as a proxy for instance, right?

Was the earlier IO error you posted also caused by the disk being out of space? It unfortunately doesn’t say in the SharedTLogFailed event it seems. If so, then I agree it seems like a symptom rather than a cause.

When you commit mutations to the database, you get back a response when that data is durable on the logs at the desired replication factor. The data is moved from the logs to the storage servers asynchronously, and this process does not block the commit at all.

When you start a new transaction after you’ve committed something, your new transaction will be assigned a read version that is at least as large as your recent commit, meaning that your transaction will see the results of that commit. It is possible that the storage servers don’t yet have this version, so if you try to read it from them the read will block waiting for it to arrive or even fail back out to the client with a retryable error (called future_version). In practice, I think it’s rare to have to be subjected to a delay when reading data, although you have reported seeing the future_version error (1009). It may just be more common when saturating the cluster.

In the case where we allow a single fault domain to fall behind (this requires double or higher replication), then it’s possible that at read time that you try to read from the storage server that is behind. In this case, it will answer quickly to tell you that it has fallen behind and the client will automatically make the request to the another replica that has the data (which Ratekeeper shouldn’t be allowing to fall behind).

The transaction class processes can actually run some other roles, though it should avoid creating storage roles on them. If you want to separate other types, such as proxies, we recommend also creating processes with the stateless class. See Configuration — FoundationDB 7.1, although the numbers here are specific to triple-replicated clusters and assume you are trying to run a largish cluster. It may be more trouble than it’s worth to try to configure these classes on smaller clusters, though you can certainly try it to see if it makes a difference.

At this time I used a lower write rate: less threads and a smaller bulk chunk for one transaction. Also I set 3 log roles instead of 2 by default for my double replication. So all data was inserted without strange memory and disk usage. But now I see weird behaviour with cluster roles. I think this could be a cause of the issue from my previous tries.

So I have following roles among my process:

    "address": "192.168.1.76:4501:tls",
        "role": "master"
        "role": "log"
        "role": "storage",
    "address": "192.168.1.29:4500:tls",
        "role": "log"
        "role": "storage",
    "address": "192.168.1.29:4501:tls",
        "role": "storage",
        "role": "resolver"
    "address": "192.168.1.78:4500:tls",
        "role": "log"
        "role": "storage",
    "address": "192.168.1.76:4500:tls",
        "role": "cluster_controller"
        "role": "storage",
    "address": "192.168.1.78:4501:tls",
        "role": "proxy"
        "role": "storage",

And after 10 seconds all roles have been migrated to other processes:


    "address": "192.168.1.76:4501:tls",
        "role": "master"
        "role": "log"
        "role": "storage",
    "address": "192.168.1.29:4500:tls",
        "role": "cluster_controller"
        "role": "log"
        "role": "storage",
    "address": "192.168.1.29:4501:tls",
        "role": "proxy"
        "role": "storage",
    "address": "192.168.1.78:4500:tls",
        "role": "log"
        "role": "storage",
    "address": "192.168.1.76:4500:tls",
        "role": "proxy"
        "role": "storage",
    "address": "192.168.1.78:4501:tls",
        "role": "storage",
        "role": "resolver"

According to the documentation Cluster Controller is in charge of setting roles. So when I get a new cluster controller I’ll get a whole new topology and abort all current transaction. But I can’t figure out the reason why. I see a lot of SlowTask, TLogQueueCommitSlow and ConnectionTimeout in the logs.

Can I increase these timeouts?

UPDATE: and I see how memory usage is increasing after restart. I think the reason is this frequent role changes.

UPDATE2: I found following timeout for TLogs:

https://github.com/apple/foundationdb/blob/master/fdbserver/Knobs.cpp#L30

What do you think about increasing this one?

Thank you for clarification, yes I see this retrieable errors time to time.

Ok, I see, thank you.

UPD: Some greps from logs that looks suspicious:

<Event Severity="10" Time="1528299228.998529" Type="MasterProxyTerminated" Machine="192.168.1.29:4500" ID="ce3a32509802dc79" Error="master_tlog_failed" ErrorDescription="Master terminating because a TLog failed" ErrorCode="1205" logGroup="default"/>

<Event Severity="20" Time="1528299665.950467" Type="clusterWatchDatabaseRetrying" Machine="192.168.1.29:4500" ID="11fbe6492ef4e0f5" Error="no_more_servers" ErrorDescription="Not enough physical servers available" ErrorCode="1008" logGroup="default"/>

<Event Severity="20" Time="1528298683.457009" Type="N2_ReadProbeError" Machine="192.168.1.29:4500" ID="e47fd45a318e7a4c" Message="125" SuppressedEventCount="0" logGroup="default"/>

<Event Severity="10" Time="1528299485.977833" Type="TLSConnectionRecvError" Machine="192.168.1.29:4500" ID="9a01b9f61228f725" Error="connection_failed" ErrorDescription="Network connection failed" ErrorCode="1026" logGroup="default"/>

<Event Severity="10" Time="1528304144.520750" Type="FKBlockFail" Machine="192.168.1.76:4501" ID="dd664403aa073383" FKID="d48d2b6589785ada" Error="operation_cancelled" ErrorDescription="Asynchronous operation cancelled" ErrorCode="1101" SuppressedEventCount="0" logGroup="default"/>

<Event Severity="20" Time="1528304127.760103" Type="FailureMonitorClientSlow" Machine="192.168.1.76:4501" ID="0000000000000000" Elapsed="0.391167" Expected="0.1" logGroup="default"/>

I think that it is possible to fill a memory engine such that it crashes on restart. The only way I have gotten out of that situation is to increase the memory limit.

Yes, but additional memory doesn’t fix issue with continuously migration roles among cluster nodes.
Looks like some timeout issues, but I can’t find the cause.