Document Layer OOM

I have a single instance of Document Layer running in a k8s cluster. Document Layer 1.6.3 and FDB 6.1.8 with 2 CPU/4 GB RAM requested and 4 CPU/4 GB RAM limits for the pod. Upon testing with ~2.5 reads/second for a period of several hours, the pod fails with an OOM error. The network traffic is mild 80 kB/sec. Upon looking at the logs, there are 3 events that happen with a regular cadence that I’m not familiar with:

<Event Severity="30" Time="1564750938.622142" Type="Unable to write status" ID="0000000000000000" Error="Request for future version" Machine="xxx.xxx.xxx.xxx:27016" LogGroup="default" />
   
<Event Severity="20" Time="1564745646.132590" Type="N2_ReadError" ID="9e9b2fdcfd1e419a" SuppressedEventCount="7" Message="2" Machine="xxx.xxx.xxx.xxx:27016" LogGroup="default" />
   
<Event Severity="20" Time="1564745116.640767" Type="Net2SlowTaskTrace" ID="0000000000000000" TraceTime="1564745116.640827" Trace="addr2line -e fdbdoc.debug -p -C -f -i 0x7ff97273c890 0x7ff9712439e2 0x7ff9712477d8 0x7ff97124a2ed 0x8406b8 0x83f7f9 0x88df71 0x6682cb 0x670256 0x673df6 0x677431 0x452c36 0x457d66 0x45656a 0x41a69b 0x6b857f 0x447d00 0x7ff9711d4b97 0x40ca69" Machine="xxx.xxx.xxx.xxx:27016" LogGroup="default" />

Any thoughts on logs and/or troubleshooting steps?

Here are the resource usage charts for the pod:

UPDATE: Tested with Document Layer v1.7.1 with same results.

UPDATE by 2.5 reads/second I mean 2.5 mongoAPI collection.find_one({_id: doc_id}) /second. This workload results in the following metric logs:

<Event Severity="10" Time="1564753943.547671" Type="ConsoleMetric" ID="0000000000000000" MetricId="queryLatency_us" MetricType="TIMER" Count="211" Sum="356157" Avg="1687.95" Max="5105" Min="22" Top25%="28" Top50%="39" Top90%="4647" Top99%="4972" Top99.99%="5105" Machine="xxx:27016" LogGroup="default" />
<Event Severity="10" Time="1564753943.547671" Type="ConsoleMetric" ID="0000000000000000" MetricId="messageRate" MetricType="METER (rate per second)" Count="211" Sum="211" Avg="42.864" Max="1" Min="1" Top25%="1" Top50%="1" Top90%="1" Top99%="1" Top99.99%="1" Machine="xxx:27016" LogGroup="default" />
<Event Severity="10" Time="1564753943.547671" Type="ConsoleMetric" ID="0000000000000000" MetricId="messageLength" MetricType="HISTOGRAMS" Count="211" Sum="43792" Avg="207.545" Max="282" Min="58" Top25%="88" Top50%="264" Top90%="282" Top99%="282" Top99.99%="282" Machine="xxx:27016" LogGroup="default" />
<Event Severity="10" Time="1564753943.547671" Type="ConsoleMetric" ID="0000000000000000" MetricId="queryRate" MetricType="METER (rate per second)" Count="211" Sum="211" Avg="42.8646" Max="1" Min="1" Top25%="1" Top50%="1" Top90%="1" Top99%="1" Top99.99%="1" Machine="xxx:27016" LogGroup="default" />

Is this pod only for the document layer, or for FoundationDB also?

That is a weird collection of errors though. Would you be able to pastebin the full trace logs ?

What version of the document layer did the Net2SlowTaskTrace line come from?

The pod is only running Document Layer. The pasted errors came from version 1.6.3. It may be Monday before I can get the full trace.

The FDB cluster is in a separate VPC with a peering connection. Functionally, the doc layer works fine with the resources I use the layer for. There appears to be something going on with the Net2 code or my configuration.

Recreated leak condition this morning under the same load and grabbed trace. Document Layer version 1.7.1:

https://pastebin.com/mVUx8sL2

Can you also share the queries? It is quite easy to cause OOM with some kind of queries. For example, sort/orderby doesn’t use indexes. So if you are trying to use orderby, it’s quite possible for Doc Layer to attempt to keep the entire data in memory to sort. Having said that, I wouldn’t rule out memory leaks.

Under this workload, there is roughly 2 requests per second. Each request requires:

  1. an auth step that checks permissions of requester, grabbing a single document that is quite small and from a collection that has maybe dozens of documents

     self.collection.find_one({DOCID: rsc_id})
    
  2. a meta lookup on the resource being modified. This has the same exact query and collection parameters as above

Occasionally, there is also a meta update push on a separate resource in the back-end:

 self.collection.update(
        {'_id': self._meta_id_, tag_str: {'$exists': False}},
        {'$set': {tag_str: ''}}
    )

I did test the update above as being a source of the issue. I removed the conditional update and ran the workload without updating the meta. The problem persisted.

All other fdb activities around this request workload are outside of Document Layer, but the FDB server itself has a healthy workload in parallel. This is an ETL workload where the requests are dumps of time series data. In production, this workload would have 10-200x the workload of this test. The idea is to scale the services and Document Layer.

When I took a step back and looked at the memory history of the server running Document Layer, there is always memory creep, the rate of which increases as the workload increases. There is a monitoring workload that queries once per minute on another resource.

self.collection.find(
        {},
        {FIELDS.data: 0}
    )

There is also an N2_ReadError event in the trace at about the same frequency during the ‘slow’ times where I’m not load testing. My hypothesis right now is that the memory is creeping proportional to the amount of N2_ReadError events.

For better context of the origin of the server pod, the docker image for the pod is built from the repo’s Dockerfile: https://github.com/FoundationDB/fdb-document-layer/blob/master/packaging/docker/Dockerfile

Where the FDB_CLIENT_VERSION arg is set to 6.1.8, which is the FDB server version that the doc layer is connecting to.

I have built with FDB_DOC_VERSION set to 1.6.3 and 1.7.1. Both of these configurations result in the same Net2 errors.

The last docker build I tried was to install the entire client binary package instead of copying the libfdb so file. Config: FDB Server 6.1.8, Doc Layer 1.7.1. The problem persists. I turned on verbose logging. It is not giving me much in terms of a smoking gun. There are 3 distinct query types that happen, for the most part, they execute with no problems:

<Event Severity="10" Time="1565037006.426282" Type="BD_serverClosedConnection" ID="0000000000000000" Machine="ip:27016" LogGroup="default" />
<Event Severity="10" Time="1565037006.426282" Type="BD_serverNewConnection" ID="0000000000000000" Machine="ip:27016" LogGroup="default" />
<Event Severity="10" Time="1565037006.427283" Type="BD_processRequest" ID="0000000000000000" Message="QUERY: { <connection_request> }, collection=admin.$cmd, flags=0, numberToSkip=0, numberToReturn=-1 (HEADER: messageLength=264, requestID=1706328732, responseTo=0, opCode=2004)" Machine="ip:27016" LogGroup="default" />
<Event Severity="10" Time="1565037006.427283" Type="BD_doRun" ID="0000000000000000" Reply="REPLY: documents=[ { ismaster: true, maxBsonObjectSize: 16777216, maxMessageSizeBytes: 48000000, maxWriteBatchSize: 1000, localTime: new Date(1565037006427), minWireVersion: 0, maxWireVersion: 3, ok: 1.0 } ], responseFlags=0, cursorID=0, startingFrom=0 (HEADER: messageLength=0, requestID=0, responseTo=1706328732, opCode=1)" Machine="ip:27016" LogGroup="default" />
<Event Severity="10" Time="1565037006.427283" Type="BD_msgReply" ID="0000000000000000" Message="REPLY: documents=[ { ismaster: true, maxBsonObjectSize: 16777216, maxMessageSizeBytes: 48000000, maxWriteBatchSize: 1000, localTime: new Date(1565037006427), minWireVersion: 0, maxWireVersion: 3, ok: 1.0 } ], responseFlags=0, cursorID=0, startingFrom=0 (HEADER: messageLength=194, requestID=0, responseTo=1706328732, opCode=1)" Machine="ip:27016" LogGroup="default" />
<Event Severity="10" Time="1565037006.427283" Type="BD_serverNewConnection" ID="0000000000000000" Machine="ip:27016" LogGroup="default" />
<Event Severity="10" Time="1565037006.428507" Type="BD_processRequest" ID="0000000000000000" Message="QUERY: { ismaster: 1, client: { driver: { name: &quot;PyMongo&quot;, version: &quot;3.7.1&quot; }, os: { type: &quot;Linux&quot;, name: &quot;Linux&quot;, architecture: &quot;x86_64&quot;, version: &quot;4.14.106-97.85.amzn2.x86_64&quot; }, platform: &quot;CPython 3.6.8.final.0&quot; }, compression: {} }, collection=admin.$cmd, flags=0, numberToSkip=0, numberToReturn=-1 (HEADER: messageLength=282, requestID=245755489, responseTo=0, opCode=2004)" Machine="ip:27016" LogGroup="default" />
<Event Severity="10" Time="1565037006.428507" Type="BD_doRun" ID="0000000000000000" Reply="REPLY: documents=[ { ismaster: true, maxBsonObjectSize: 16777216, maxMessageSizeBytes: 48000000, maxWriteBatchSize: 1000, localTime: new Date(1565037006428), minWireVersion: 0, maxWireVersion: 3, ok: 1.0 } ], responseFlags=0, cursorID=0, startingFrom=0 (HEADER: messageLength=0, requestID=0, responseTo=245755489, opCode=1)" Machine="ip:27016" LogGroup="default" />
<Event Severity="10" Time="1565037006.428507" Type="BD_msgReply" ID="0000000000000000" Message="REPLY: documents=[ { ismaster: true, maxBsonObjectSize: 16777216, maxMessageSizeBytes: 48000000, maxWriteBatchSize: 1000, localTime: new Date(1565037006428), minWireVersion: 0, maxWireVersion: 3, ok: 1.0 } ], responseFlags=0, cursorID=0, startingFrom=0 (HEADER: messageLength=194, requestID=0, responseTo=245755489, opCode=1)" Machine="ip:27016" LogGroup="default" />
<Event Severity="10" Time="1565037006.428681" Type="BD_processRequest" ID="0000000000000000" Message="QUERY: { _id: <rsc_id>; }, collection=<some-collection>, flags=4, numberToSkip=0, numberToReturn=-1 (HEADER: messageLength=88, requestID=1078780340, responseTo=0, opCode=2004)" Machine="ip:27016" LogGroup="default" />
<Event Severity="10" Time="1565037006.432421" Type="BD_construct_filter_plan" ID="0000000000000000" source_plan="{ type: &quot;table scan&quot; }" pushed_down_into="{ type: &quot;PK lookup&quot;, bounds: { begin: &quot;&quot;imperial&quot;&quot;, end: &quot;&quot;imperial&quot;&quot; } }" Machine="ip:27016" LogGroup="default" />
<Event Severity="10" Time="1565037006.432421" Type="BD_GetMatchingDocs" ID="0000000000000000" QueryPredicate="AND(ANY(ExtPath(_id) matching EQUALS('<rsc_id>')))" SimplifiedPredicate="AND(ANY(ExtPath(_id) matching EQUALS('<rsc_id>')))" Plan="{ type: &quot;PK lookup&quot;, bounds: { begin: &quot;<rsc_id>;&quot;, end: &quot;<rsc_id>&quot; } }" Machine="ip:27016" LogGroup="default" />
<Event Severity="10" Time="1565037006.434524" Type="BD_doRun" ID="0000000000000000" Reply="REPLY: documents=[ { <the_correct_document> } ], responseFlags=8, cursorID=0, startingFrom=0 (HEADER: messageLength=0, requestID=0, responseTo=1078780340, opCode=1)" Machine="ip:27016" LogGroup="default" />
<Event Severity="10" Time="1565037006.434524" Type="BD_msgReply" ID="0000000000000000" Message="REPLY: documents=[ { <the_correct_document> } ], responseFlags=8, cursorID=0, startingFrom=0 (HEADER: messageLength=128, requestID=0, responseTo=1078780340, opCode=1)" Machine="ip:27016" LogGroup="default" />
<Event Severity="10" Time="1565037006.449930" Type="BD_serverClosedConnection" ID="0000000000000000" Machine="ip:27016" LogGroup="default" />
<Event Severity="10" Time="1565037006.450067" Type="BD_serverClosedConnection" ID="0000000000000000" Machine="ip:27016" LogGroup="default" />
<Event Severity="10" Time="1565037006.450067" Type="BD_serverNewConnection" ID="0000000000000000" Machine="ip:27016" LogGroup="default" />

Then, randomly, for any of the 3 query types, I will receive:

<Event Severity="20" Time="1565037006.875053" Type="N2_ReadError" ID="5d05202df29b42ce" SuppressedEventCount="29" Message="2" Machine="ip:27016" LogGroup="default" />
<Event Severity="10" Time="1565037006.875053" Type="BD_serverClosedConnection" ID="0000000000000000" Machine="ip:27016" LogGroup="default" />

With the N2_ReadError inserted before the serverClosed log, but always at the same timestamp. There is no apparent pattern for which query this happens after. It does appear to happen about once / second, looking at the timestamps:

1565037009.232541
1565037010.279092
1565037011.403663
1565037012.414639
1565037013.510789
1565037014.520024
1565037015.523663

The memory leak continues at 16 MB/min.

Yeah, I agree with you. This has nothing to do with the kind of requests you are doing. I could reproduce this kind of behavior very easily locally with huge churn of connections with simple commands like isMaster. The leak is predominant, if I run a command like listDatabases. Probably we have more than one kind of leak.

On FoundationDB server, we periodically run Valgrind to catch memory leaks. We never ran it on Document Layer. I have raised an GitHub issue to track this particular issue - https://github.com/FoundationDB/fdb-document-layer/issues/202.

I suspect the usage of Flow, which I may or may not have built with the right debugging setup (thought I would dive in to this, but after this morning’s exercise I’ll leave it to the Pros :slight_smile:) :

Workload: ~ 13000 new connections w/ getDatabases

valgrind --leak-check=yes build/bin/fdbdoc -l 127.0.0.1:27017 -C /etc/foundationdb/fdb.cluster
==6920== Memcheck, a memory error detector
==6920== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==6920== Using Valgrind-3.13.0 and LibVEX; rerun with -h for copyright info
==6920== Command: build/bin/fdbdoc -l 127.0.0.1:27017 -C /etc/foundationdb/fdb.cluster
==6920== 
0FdbDocServer (1.7): listening on 127.0.0.1:27017
^C==6920== 
==6920== Process terminating with default action of signal 2 (SIGINT)
==6920==    at 0x6302BB7: epoll_wait (epoll_wait.c:30)
==6920==    by 0x6D697F: boost::asio::detail::epoll_reactor::run(long, boost::asio::detail::op_queue<boost::asio::detail::scheduler_operation>&) (epoll_reactor.ipp:471)
==6920==    by 0x6D79EB: boost::asio::detail::scheduler::do_run_one(boost::asio::detail::conditionally_enabled_mutex::scoped_lock&, boost::asio::detail::scheduler_thread_info&, boost::system::error_code const&) (scheduler.ipp:385)
==6920==    by 0x6D9D59: boost::asio::detail::scheduler::run_one(boost::system::error_code&) (scheduler.ipp:175)
==6920==    by 0x6D1A7B: run_one (io_context.ipp:77)
==6920==    by 0x6D1A7B: N2::ASIOReactor::sleepAndReact(double) (Net2.actor.cpp:994)
==6920==    by 0x6D2FA3: N2::Net2::run() (Net2.actor.cpp:608)
==6920==    by 0x450873: main (in /home/yennie/repos/fdb-document-layer/build/bin/fdbdoc)
==6920== 
==6920== HEAP SUMMARY:
==6920==     in use at exit: 1,305,768 bytes in 13,578 blocks
==6920==   total heap usage: 904,854 allocs, 891,276 frees, 42,884,036 bytes allocated
==6920== 
==6920== 56 bytes in 1 blocks are definitely lost in loss record 1,908 of 2,903
==6920==    at 0x4C3017F: operator new(unsigned long) (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==6920==    by 0x69C75D: TraceEvent::setNetworkThread() (Trace.cpp:990)
==6920==    by 0x450304: main (in /home/yennie/repos/fdb-document-layer/build/bin/fdbdoc)
==6920== 
==6920== 304 bytes in 1 blocks are possibly lost in loss record 2,759 of 2,903
==6920==    at 0x4C31B25: calloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==6920==    by 0x40134A6: allocate_dtv (dl-tls.c:286)
==6920==    by 0x40134A6: _dl_allocate_tls (dl-tls.c:530)
==6920==    by 0x4E44227: allocate_stack (allocatestack.c:627)
==6920==    by 0x4E44227: pthread_create@@GLIBC_2.2.5 (pthread_create.c:644)
==6920==    by 0x67C9C7: startThread(void* (*)(void*), void*) (Platform.cpp:2186)
==6920==    by 0x6F4D90: ThreadPool::addThread(IThreadPoolReceiver*) (vector.tcc:98)
==6920==    by 0x6B78C3: TraceLog::open(std::string const&, std::string const&, std::string, std::string const&, unsigned long, unsigned long, Optional<NetworkAddress>) (Trace.cpp:306)
==6920==    by 0x6A63E9: openTraceFile(NetworkAddress const&, unsigned long, unsigned long, std::string, std::string, std::string) (Trace.cpp:632)
==6920==    by 0x45039A: main (in /home/yennie/repos/fdb-document-layer/build/bin/fdbdoc)
==6920== 
==6920== 304 bytes in 1 blocks are possibly lost in loss record 2,760 of 2,903
==6920==    at 0x4C31B25: calloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==6920==    by 0x40134A6: allocate_dtv (dl-tls.c:286)
==6920==    by 0x40134A6: _dl_allocate_tls (dl-tls.c:530)
==6920==    by 0x4E44227: allocate_stack (allocatestack.c:627)
==6920==    by 0x4E44227: pthread_create@@GLIBC_2.2.5 (pthread_create.c:644)
==6920==    by 0x67E5EC: startThread (Platform.cpp:2186)
==6920==    by 0x67E5EC: setupSlowTaskProfiler() (Platform.cpp:2840)
==6920==    by 0x454610: (anonymous namespace)::SetupActorState<(anonymous namespace)::SetupActor>::a_body1(int) (in /home/yennie/repos/fdb-document-layer/build/bin/fdbdoc)
==6920==    by 0x44E356: (anonymous namespace)::SetupActor::SetupActor(NetworkAddress const&, Optional<unsigned short> const&, std::string const&, ConnectionOptions const&, char const* const&, std::string const&, std::vector<std::pair<std::string, std::string>, std::allocator<std::pair<std::string, std::string> > > const&, std::vector<std::pair<FDBNetworkOption, Standalone<StringRef> >, std::allocator<std::pair<FDBNetworkOption, Standalone<StringRef> > > > const&, std::string const&) (in /home/yennie/repos/fdb-document-layer/build/bin/fdbdoc)
==6920==    by 0x44E3DF: setup(NetworkAddress const&, Optional<unsigned short> const&, std::string const&, ConnectionOptions const&, char const* const&, std::string const&, std::vector<std::pair<std::string, std::string>, std::allocator<std::pair<std::string, std::string> > > const&, std::vector<std::pair<FDBNetworkOption, Standalone<StringRef> >, std::allocator<std::pair<FDBNetworkOption, Standalone<StringRef> > > > const&, std::string const&) (in /home/yennie/repos/fdb-document-layer/build/bin/fdbdoc)
==6920==    by 0x4507C3: main (in /home/yennie/repos/fdb-document-layer/build/bin/fdbdoc)
==6920== 
==6920== 304 bytes in 1 blocks are possibly lost in loss record 2,761 of 2,903
==6920==    at 0x4C31B25: calloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==6920==    by 0x40134A6: allocate_dtv (dl-tls.c:286)
==6920==    by 0x40134A6: _dl_allocate_tls (dl-tls.c:530)
==6920==    by 0x4E44227: allocate_stack (allocatestack.c:627)
==6920==    by 0x4E44227: pthread_create@@GLIBC_2.2.5 (pthread_create.c:644)
==6920==    by 0x67C9C7: startThread(void* (*)(void*), void*) (Platform.cpp:2186)
==6920==    by 0x45462D: (anonymous namespace)::SetupActorState<(anonymous namespace)::SetupActor>::a_body1(int) (in /home/yennie/repos/fdb-document-layer/build/bin/fdbdoc)
==6920==    by 0x44E356: (anonymous namespace)::SetupActor::SetupActor(NetworkAddress const&, Optional<unsigned short> const&, std::string const&, ConnectionOptions const&, char const* const&, std::string const&, std::vector<std::pair<std::string, std::string>, std::allocator<std::pair<std::string, std::string> > > const&, std::vector<std::pair<FDBNetworkOption, Standalone<StringRef> >, std::allocator<std::pair<FDBNetworkOption, Standalone<StringRef> > > > const&, std::string const&) (in /home/yennie/repos/fdb-document-layer/build/bin/fdbdoc)
==6920==    by 0x44E3DF: setup(NetworkAddress const&, Optional<unsigned short> const&, std::string const&, ConnectionOptions const&, char const* const&, std::string const&, std::vector<std::pair<std::string, std::string>, std::allocator<std::pair<std::string, std::string> > > const&, std::vector<std::pair<FDBNetworkOption, Standalone<StringRef> >, std::allocator<std::pair<FDBNetworkOption, Standalone<StringRef> > > > const&, std::string const&) (in /home/yennie/repos/fdb-document-layer/build/bin/fdbdoc)
==6920==    by 0x4507C3: main (in /home/yennie/repos/fdb-document-layer/build/bin/fdbdoc)
==6920== 
==6920== LEAK SUMMARY:
==6920==    definitely lost: 56 bytes in 1 blocks
==6920==    indirectly lost: 0 bytes in 0 blocks
==6920==      possibly lost: 912 bytes in 3 blocks
==6920==    still reachable: 1,304,800 bytes in 13,574 blocks
==6920==                       of which reachable via heuristic:
==6920==                         stdstring          : 87,430 bytes in 1,691 blocks
==6920==         suppressed: 0 bytes in 0 blocks
==6920== Reachable blocks (those to which a pointer was found) are not shown.
==6920== To see them, rerun with: --leak-check=full --show-leak-kinds=all
==6920== 
==6920== For counts of detected and suppressed errors, rerun with: -v
==6920== ERROR SUMMARY: 4 errors from 4 contexts (suppressed: 0 from 0)

w/ full|all

==9090== 36,480 bytes in 760 blocks are still reachable in loss record 2,934 of 2,942
==9090==    at 0x4C3017F: operator new(unsigned long) (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==9090==    by 0x465BE0: ExtConnection::ExtConnection(Reference<DocumentLayer>, Reference<BufferedConnection>, long) (in /home/yennie/repos/fdb-document-layer/build/bin/fdbdoc)
==9090==    by 0x45226A: (anonymous namespace)::ExtServerConnectionActorState<(anonymous namespace)::ExtServerConnectionActor>::a_body1(int) (in /home/yennie/repos/fdb-document-layer/build/bin/fdbdoc)
==9090==    by 0x44D106: (anonymous namespace)::ExtServerConnectionActor::ExtServerConnectionActor(Reference<DocumentLayer> const&, Reference<BufferedConnection> const&, long const&) (in /home/yennie/repos/fdb-document-layer/build/bin/fdbdoc)
==9090==    by 0x44D213: extServerConnection(Reference<DocumentLayer> const&, Reference<BufferedConnection> const&, long const&) (in /home/yennie/repos/fdb-document-layer/build/bin/fdbdoc)
==9090==    by 0x45925E: (anonymous namespace)::ExtServerActorState<(anonymous namespace)::ExtServerActor>::a_body1loopBody1when1(Reference<IConnection> const&, int) (in /home/yennie/repos/fdb-document-layer/build/bin/fdbdoc)
==9090==    by 0x462F05: (anonymous namespace)::ExtServerActorState<(anonymous namespace)::ExtServerActor>::a_callback_fire(ActorCallback<(anonymous namespace)::ExtServerActor, 0, Reference<IConnection> >*, Reference<IConnection>) (in /home/yennie/repos/fdb-document-layer/build/bin/fdbdoc)
==9090==    by 0x460115: ActorCallback<(anonymous namespace)::ExtServerActor, 0, Reference<IConnection> >::fire(Reference<IConnection> const&) (in /home/yennie/repos/fdb-document-layer/build/bin/fdbdoc)
==9090==    by 0x6E1173: finishSendAndDelPromiseRef (flow.h:331)
==9090==    by 0x6E1173: N2::Listener::DoAcceptActorState<N2::Listener::DoAcceptActor>::a_body1cont2(Void const&, int) (Net2.actor.g.cpp:759)
==9090==    by 0x6E1D43: a_body1when1 (Net2.actor.g.cpp:766)
==9090==    by 0x6E1D43: a_callback_fire (Net2.actor.g.cpp:780)
==9090==    by 0x6E1D43: ActorCallback<N2::Listener::DoAcceptActor, 0, Void>::fire(Void const&) (flow.h:850)
==9090==    by 0x421502: void SAV<Void>::send<Void>(Void&&) (in /home/yennie/repos/fdb-document-layer/build/bin/fdbdoc)
==9090==    by 0x6DE09E: send<Void> (flow.h:630)
==9090==    by 0x6DE09E: N2::BindPromise::operator()(boost::system::error_code const&, unsigned long) (Net2.actor.cpp:259)
==9090== 
==9090== 40,000 bytes in 1 blocks are still reachable in loss record 2,935 of 2,942
==9090==    at 0x4C2FB0F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==9090==    by 0x54C0657: ??? (in /usr/lib/libfdb_c.so)
==9090==    by 0x522E4D9: ??? (in /usr/lib/libfdb_c.so)
==9090==    by 0x52084B4: ??? (in /usr/lib/libfdb_c.so)
==9090==    by 0x5208CF9: ??? (in /usr/lib/libfdb_c.so)
==9090==    by 0x52B565A: ??? (in /usr/lib/libfdb_c.so)
==9090==    by 0x52B58F8: ??? (in /usr/lib/libfdb_c.so)
==9090==    by 0x522796F: ??? (in /usr/lib/libfdb_c.so)
==9090==    by 0x5482949: ??? (in /usr/lib/libfdb_c.so)
==9090==    by 0x51864CD: ??? (in /usr/lib/libfdb_c.so)
==9090==    by 0x52A934B: ??? (in /usr/lib/libfdb_c.so)
==9090==    by 0x51596F5: ??? (in /usr/lib/libfdb_c.so)
==9090== 
==9090== 65,556 bytes in 1 blocks are still reachable in loss record 2,936 of 2,942
==9090==    at 0x4C2FB0F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==9090==    by 0x54C0657: ??? (in /usr/lib/libfdb_c.so)
==9090==    by 0x54C06E9: ??? (in /usr/lib/libfdb_c.so)
==9090==    by 0x513FED7: ??? (in /usr/lib/libfdb_c.so)
==9090==    by 0x5385588: ??? (in /usr/lib/libfdb_c.so)
==9090==    by 0x538707C: ??? (in /usr/lib/libfdb_c.so)
==9090==    by 0x53924FA: ??? (in /usr/lib/libfdb_c.so)
==9090==    by 0x5392714: ??? (in /usr/lib/libfdb_c.so)
==9090==    by 0x538266F: ??? (in /usr/lib/libfdb_c.so)
==9090==    by 0x548AFBB: ??? (in /usr/lib/libfdb_c.so)
==9090==    by 0x548BB89: ??? (in /usr/lib/libfdb_c.so)
==9090==    by 0x522796F: ??? (in /usr/lib/libfdb_c.so)
==9090== 
==9090== 65,556 bytes in 1 blocks are still reachable in loss record 2,937 of 2,942
==9090==    at 0x4C2FB0F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==9090==    by 0x54C0657: ??? (in /usr/lib/libfdb_c.so)
==9090==    by 0x54C06E9: ??? (in /usr/lib/libfdb_c.so)
==9090==    by 0x513FED7: ??? (in /usr/lib/libfdb_c.so)
==9090==    by 0x5385588: ??? (in /usr/lib/libfdb_c.so)
==9090==    by 0x5386DE4: ??? (in /usr/lib/libfdb_c.so)
==9090==    by 0x522796F: ??? (in /usr/lib/libfdb_c.so)
==9090==    by 0x5482949: ??? (in /usr/lib/libfdb_c.so)
==9090==    by 0x51864CD: ??? (in /usr/lib/libfdb_c.so)
==9090==    by 0x52A934B: ??? (in /usr/lib/libfdb_c.so)
==9090==    by 0x51596F5: ??? (in /usr/lib/libfdb_c.so)
==9090==    by 0x513CCB8: fdb_run_network (in /usr/lib/libfdb_c.so)
==9090== 
==9090== 66,880 bytes in 760 blocks are still reachable in loss record 2,938 of 2,942
==9090==    at 0x4C3017F: operator new(unsigned long) (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==9090==    by 0x6E1B76: DoAcceptActorState (Net2.actor.cpp:463)
==9090==    by 0x6E1B76: DoAcceptActor (Net2.actor.g.cpp:823)
==9090==    by 0x6E1B76: doAccept (Net2.actor.cpp:461)
==9090==    by 0x6E1B76: N2::Listener::accept() (Net2.actor.cpp:455)
==9090==    by 0x456F96: (anonymous namespace)::ExtServerActorState<(anonymous namespace)::ExtServerActor>::a_body1loopBody1(int) (in /home/yennie/repos/fdb-document-layer/build/bin/fdbdoc)
==9090==    by 0x45549D: (anonymous namespace)::ExtServerActorState<(anonymous namespace)::ExtServerActor>::a_body1loopHead1(int) (in /home/yennie/repos/fdb-document-layer/build/bin/fdbdoc)
==9090==    by 0x45A053: (anonymous namespace)::ExtServerActorState<(anonymous namespace)::ExtServerActor>::a_body1loopBody1cont1(int) (in /home/yennie/repos/fdb-document-layer/build/bin/fdbdoc)
==9090==    by 0x4592A6: (anonymous namespace)::ExtServerActorState<(anonymous namespace)::ExtServerActor>::a_body1loopBody1when1(Reference<IConnection> const&, int) (in /home/yennie/repos/fdb-document-layer/build/bin/fdbdoc)
==9090==    by 0x462F05: (anonymous namespace)::ExtServerActorState<(anonymous namespace)::ExtServerActor>::a_callback_fire(ActorCallback<(anonymous namespace)::ExtServerActor, 0, Reference<IConnection> >*, Reference<IConnection>) (in /home/yennie/repos/fdb-document-layer/build/bin/fdbdoc)
==9090==    by 0x460115: ActorCallback<(anonymous namespace)::ExtServerActor, 0, Reference<IConnection> >::fire(Reference<IConnection> const&) (in /home/yennie/repos/fdb-document-layer/build/bin/fdbdoc)
==9090==    by 0x6E1173: finishSendAndDelPromiseRef (flow.h:331)
==9090==    by 0x6E1173: N2::Listener::DoAcceptActorState<N2::Listener::DoAcceptActor>::a_body1cont2(Void const&, int) (Net2.actor.g.cpp:759)
==9090==    by 0x6E1D43: a_body1when1 (Net2.actor.g.cpp:766)
==9090==    by 0x6E1D43: a_callback_fire (Net2.actor.g.cpp:780)
==9090==    by 0x6E1D43: ActorCallback<N2::Listener::DoAcceptActor, 0, Void>::fire(Void const&) (flow.h:850)
==9090==    by 0x421502: void SAV<Void>::send<Void>(Void&&) (in /home/yennie/repos/fdb-document-layer/build/bin/fdbdoc)
==9090==    by 0x6DE09E: send<Void> (flow.h:630)
==9090==    by 0x6DE09E: N2::BindPromise::operator()(boost::system::error_code const&, unsigned long) (Net2.actor.cpp:259)
==9090== 
==9090== 80,000 bytes in 1 blocks are still reachable in loss record 2,939 of 2,942
==9090==    at 0x4C3089F: operator new[](unsigned long) (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==9090==    by 0x6CCF27: initProfiling() (Net2.actor.cpp:81)
==9090==    by 0x67E583: setupSlowTaskProfiler() (Platform.cpp:2828)
==9090==    by 0x454610: (anonymous namespace)::SetupActorState<(anonymous namespace)::SetupActor>::a_body1(int) (in /home/yennie/repos/fdb-document-layer/build/bin/fdbdoc)
==9090==    by 0x44E356: (anonymous namespace)::SetupActor::SetupActor(NetworkAddress const&, Optional<unsigned short> const&, std::string const&, ConnectionOptions const&, char const* const&, std::string const&, std::vector<std::pair<std::string, std::string>, std::allocator<std::pair<std::string, std::string> > > const&, std::vector<std::pair<FDBNetworkOption, Standalone<StringRef> >, std::allocator<std::pair<FDBNetworkOption, Standalone<StringRef> > > > const&, std::string const&) (in /home/yennie/repos/fdb-document-layer/build/bin/fdbdoc)
==9090==    by 0x44E3DF: setup(NetworkAddress const&, Optional<unsigned short> const&, std::string const&, ConnectionOptions const&, char const* const&, std::string const&, std::vector<std::pair<std::string, std::string>, std::allocator<std::pair<std::string, std::string> > > const&, std::vector<std::pair<FDBNetworkOption, Standalone<StringRef> >, std::allocator<std::pair<FDBNetworkOption, Standalone<StringRef> > > > const&, std::string const&) (in /home/yennie/repos/fdb-document-layer/build/bin/fdbdoc)
==9090==    by 0x4507C3: main (in /home/yennie/repos/fdb-document-layer/build/bin/fdbdoc)
==9090== 
==9090== 80,000 bytes in 1 blocks are still reachable in loss record 2,940 of 2,942
==9090==    at 0x4C3089F: operator new[](unsigned long) (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==9090==    by 0x6CCF43: initProfiling() (Net2.actor.cpp:82)
==9090==    by 0x67E583: setupSlowTaskProfiler() (Platform.cpp:2828)
==9090==    by 0x454610: (anonymous namespace)::SetupActorState<(anonymous namespace)::SetupActor>::a_body1(int) (in /home/yennie/repos/fdb-document-layer/build/bin/fdbdoc)
==9090==    by 0x44E356: (anonymous namespace)::SetupActor::SetupActor(NetworkAddress const&, Optional<unsigned short> const&, std::string const&, ConnectionOptions const&, char const* const&, std::string const&, std::vector<std::pair<std::string, std::string>, std::allocator<std::pair<std::string, std::string> > > const&, std::vector<std::pair<FDBNetworkOption, Standalone<StringRef> >, std::allocator<std::pair<FDBNetworkOption, Standalone<StringRef> > > > const&, std::string const&) (in /home/yennie/repos/fdb-document-layer/build/bin/fdbdoc)
==9090==    by 0x44E3DF: setup(NetworkAddress const&, Optional<unsigned short> const&, std::string const&, ConnectionOptions const&, char const* const&, std::string const&, std::vector<std::pair<std::string, std::string>, std::allocator<std::pair<std::string, std::string> > > const&, std::vector<std::pair<FDBNetworkOption, Standalone<StringRef> >, std::allocator<std::pair<FDBNetworkOption, Standalone<StringRef> > > > const&, std::string const&) (in /home/yennie/repos/fdb-document-layer/build/bin/fdbdoc)
==9090==    by 0x4507C3: main (in /home/yennie/repos/fdb-document-layer/build/bin/fdbdoc)
==9090== 
==9090== 93,568 bytes in 688 blocks are still reachable in loss record 2,941 of 2,942
==9090==    at 0x4C3017F: operator new(unsigned long) (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==9090==    by 0x4335B2: BufferedConnection::BufferedConnection(Reference<IConnection>) (in /home/yennie/repos/fdb-document-layer/build/bin/fdbdoc)
==9090==    by 0x459227: (anonymous namespace)::ExtServerActorState<(anonymous namespace)::ExtServerActor>::a_body1loopBody1when1(Reference<IConnection> const&, int) (in /home/yennie/repos/fdb-document-layer/build/bin/fdbdoc)
==9090==    by 0x462F05: (anonymous namespace)::ExtServerActorState<(anonymous namespace)::ExtServerActor>::a_callback_fire(ActorCallback<(anonymous namespace)::ExtServerActor, 0, Reference<IConnection> >*, Reference<IConnection>) (in /home/yennie/repos/fdb-document-layer/build/bin/fdbdoc)
==9090==    by 0x460115: ActorCallback<(anonymous namespace)::ExtServerActor, 0, Reference<IConnection> >::fire(Reference<IConnection> const&) (in /home/yennie/repos/fdb-document-layer/build/bin/fdbdoc)
==9090==    by 0x6E1173: finishSendAndDelPromiseRef (flow.h:331)
==9090==    by 0x6E1173: N2::Listener::DoAcceptActorState<N2::Listener::DoAcceptActor>::a_body1cont2(Void const&, int) (Net2.actor.g.cpp:759)
==9090==    by 0x6E1D43: a_body1when1 (Net2.actor.g.cpp:766)
==9090==    by 0x6E1D43: a_callback_fire (Net2.actor.g.cpp:780)
==9090==    by 0x6E1D43: ActorCallback<N2::Listener::DoAcceptActor, 0, Void>::fire(Void const&) (flow.h:850)
==9090==    by 0x421502: void SAV<Void>::send<Void>(Void&&) (in /home/yennie/repos/fdb-document-layer/build/bin/fdbdoc)
==9090==    by 0x6DE09E: send<Void> (flow.h:630)
==9090==    by 0x6DE09E: N2::BindPromise::operator()(boost::system::error_code const&, unsigned long) (Net2.actor.cpp:259)
==9090==    by 0x6DE655: operator() (bind_handler.hpp:65)
==9090==    by 0x6DE655: asio_handler_invoke<boost::asio::detail::binder1<N2::BindPromise, boost::system::error_code> > (handler_invoke_hook.hpp:69)
==9090==    by 0x6DE655: invoke<boost::asio::detail::binder1<N2::BindPromise, boost::system::error_code>, N2::BindPromise> (handler_invoke_helpers.hpp:37)
==9090==    by 0x6DE655: complete<boost::asio::detail::binder1<N2::BindPromise, boost::system::error_code> > (handler_work.hpp:82)
==9090==    by 0x6DE655: boost::asio::detail::reactive_socket_accept_op<boost::asio::basic_socket<boost::asio::ip::tcp>, boost::asio::ip::tcp, N2::BindPromise>::do_complete(void*, boost::asio::detail::scheduler_operation*, boost::system::error_code const&, unsigned long) (reactive_socket_accept_op.hpp:137)
==9090==    by 0x6DA22B: complete (scheduler_operation.hpp:40)
==9090==    by 0x6DA22B: boost::asio::detail::epoll_reactor::descriptor_state::do_complete(void*, boost::asio::detail::scheduler_operation*, boost::system::error_code const&, unsigned long) (epoll_reactor.ipp:776)
==9090==    by 0x6D7B78: complete (scheduler_operation.hpp:40)
==9090==    by 0x6D7B78: boost::asio::detail::scheduler::do_run_one(boost::asio::detail::conditionally_enabled_mutex::scoped_lock&, boost::asio::detail::scheduler_thread_info&, boost::system::error_code const&) (scheduler.ipp:401)
==9090== 
==9090== 115,520 bytes in 760 blocks are still reachable in loss record 2,942 of 2,942
==9090==    at 0x4C3017F: operator new(unsigned long) (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==9090==    by 0x45224B: (anonymous namespace)::ExtServerConnectionActorState<(anonymous namespace)::ExtServerConnectionActor>::a_body1(int) (in /home/yennie/repos/fdb-document-layer/build/bin/fdbdoc)
==9090==    by 0x44D106: (anonymous namespace)::ExtServerConnectionActor::ExtServerConnectionActor(Reference<DocumentLayer> const&, Reference<BufferedConnection> const&, long const&) (in /home/yennie/repos/fdb-document-layer/build/bin/fdbdoc)
==9090==    by 0x44D213: extServerConnection(Reference<DocumentLayer> const&, Reference<BufferedConnection> const&, long const&) (in /home/yennie/repos/fdb-document-layer/build/bin/fdbdoc)
==9090==    by 0x45925E: (anonymous namespace)::ExtServerActorState<(anonymous namespace)::ExtServerActor>::a_body1loopBody1when1(Reference<IConnection> const&, int) (in /home/yennie/repos/fdb-document-layer/build/bin/fdbdoc)
==9090==    by 0x462F05: (anonymous namespace)::ExtServerActorState<(anonymous namespace)::ExtServerActor>::a_callback_fire(ActorCallback<(anonymous namespace)::ExtServerActor, 0, Reference<IConnection> >*, Reference<IConnection>) (in /home/yennie/repos/fdb-document-layer/build/bin/fdbdoc)
==9090==    by 0x460115: ActorCallback<(anonymous namespace)::ExtServerActor, 0, Reference<IConnection> >::fire(Reference<IConnection> const&) (in /home/yennie/repos/fdb-document-layer/build/bin/fdbdoc)
==9090==    by 0x6E1173: finishSendAndDelPromiseRef (flow.h:331)
==9090==    by 0x6E1173: N2::Listener::DoAcceptActorState<N2::Listener::DoAcceptActor>::a_body1cont2(Void const&, int) (Net2.actor.g.cpp:759)
==9090==    by 0x6E1D43: a_body1when1 (Net2.actor.g.cpp:766)
==9090==    by 0x6E1D43: a_callback_fire (Net2.actor.g.cpp:780)
==9090==    by 0x6E1D43: ActorCallback<N2::Listener::DoAcceptActor, 0, Void>::fire(Void const&) (flow.h:850)
==9090==    by 0x421502: void SAV<Void>::send<Void>(Void&&) (in /home/yennie/repos/fdb-document-layer/build/bin/fdbdoc)
==9090==    by 0x6DE09E: send<Void> (flow.h:630)
==9090==    by 0x6DE09E: N2::BindPromise::operator()(boost::system::error_code const&, unsigned long) (Net2.actor.cpp:259)
==9090==    by 0x6DE655: operator() (bind_handler.hpp:65)
==9090==    by 0x6DE655: asio_handler_invoke<boost::asio::detail::binder1<N2::BindPromise, boost::system::error_code> > (handler_invoke_hook.hpp:69)
==9090==    by 0x6DE655: invoke<boost::asio::detail::binder1<N2::BindPromise, boost::system::error_code>, N2::BindPromise> (handler_invoke_helpers.hpp:37)
==9090==    by 0x6DE655: complete<boost::asio::detail::binder1<N2::BindPromise, boost::system::error_code> > (handler_work.hpp:82)
==9090==    by 0x6DE655: boost::asio::detail::reactive_socket_accept_op<boost::asio::basic_socket<boost::asio::ip::tcp>, boost::asio::ip::tcp, N2::BindPromise>::do_complete(void*, boost::asio::detail::scheduler_operation*, boost::system::error_code const&, unsigned long) (reactive_socket_accept_op.hpp:137)
==9090== 
==9090== LEAK SUMMARY:
==9090==    definitely lost: 56 bytes in 1 blocks
==9090==    indirectly lost: 0 bytes in 0 blocks
==9090==      possibly lost: 912 bytes in 3 blocks
==9090==    still reachable: 1,054,730 bytes in 10,667 blocks
==9090==                       of which reachable via heuristic:
==9090==                         stdstring          : 81,756 bytes in 1,503 blocks
==9090==         suppressed: 0 bytes in 0 blocks
==9090== 
==9090== For counts of detected and suppressed errors, rerun with: -v
==9090== ERROR SUMMARY: 4 errors from 4 contexts (suppressed: 0 from 0)

Just merged a PR to fix the memory leaks, at least majority of them.

Looks great so far! Thanks!

1 Like