PacketLimitExceeded error on stateless servers

Problem: Recently, we started getting the PacketLimitExceeded error (Severity 40)

We have pretty extensive load on this DB, but we haven’t noticed such error earlier. Lately we noticed that latency for getting info has been increased, which was resulted as I think due to high conflict rate, which was caused by this error.

It occurred on multiple different databases at once, I think due to increasing load on it. However, when monitoring status we are getting “The database is not being saturated by the workload”.

I have looked up the forum and issue, but couldn’t find such issues.

Also in the code base, there are “FlowKnobs” with this option, but it’s impossible to set from config if I am right?

What’s proper way to solve it?

(!) Also worth noting, that we are getting this message with status time to time, when facing this error (that’s storage server), but it looks more like a result to me:

  Performance limited by process: Storage server MVCC memory.
  Most limiting process: <IP>:4501:tls

Foundationdb version:

FoundationDB CLI 7.1 (v7.1.23)
source version dd2e68b9b3175667914673539f06b8e1071c07d6
protocol fdb00b071010000

Configuration is as follows

Configuration:
  Redundancy mode        - double
  Storage engine         - ssd-redwood-1-experimental
  Coordinators           - 6
  Usable Regions         - 2
  Regions: 
    Primary -
        Datacenter                    - WC1
    Remote -
        Datacenter                    - EC1

Cluster:
  FoundationDB processes - 247 (less 0 excluded; 6 with errors)
  Zones                  - 6
  Machines               - 6
  Memory availability    - 3.5 GB per process on machine with least available
                           >>>>> (WARNING: 4.0 GB recommended) <<<<<
  Retransmissions rate   - 16 Hz
  Fault Tolerance        - 1 machines
  Server time            - 10/19/23 10:09:24

Data:
  Replication health     - Healthy (Repartitioning)
  Moving data            - 389.000 GB
  Sum of key-value sizes - 50.081 TB
  Disk space used        - 205.194 TB

Operating space:
  Storage server         - 701.4 GB free on most full server
  Log server             - 3341.2 GB free on most full server

Workload:
  Read rate              - 42293 Hz
  Write rate             - 50489 Hz
  Transactions started   - 27356 Hz
  Transactions committed - 560 Hz
  Conflict rate          - 5 Hz

Backup and DR:
  Running backups        - 0
  Running DRs            - 0

So, this is how most of stateless servers look like (status details):

  <IP>:4560:tls  ( 13% cpu; 20% machine; 1.916 Gbps; 45% disk IO; 2.8 GB / 3.5 GB RAM  )
    Last logged error: PacketLimitExceeded at Thu Oct 19 09:33:33 2023
  <IP>:4561:tls  (  7% cpu; 20% machine; 1.916 Gbps; 53% disk IO; 3.0 GB / 3.5 GB RAM  )
    Last logged error: PacketLimitExceeded at Thu Oct 19 01:11:17 2023
  <IP>:4562:tls  (  3% cpu; 20% machine; 1.916 Gbps; 46% disk IO; 2.5 GB / 3.5 GB RAM  )
    Last logged error: PacketLimitExceeded at Thu Oct 19 01:11:10 2023

Related error description from status json

        "messages": [
          {
            "description": "PacketLimitExceeded at Thu Oct 19 09:33:33 2023",
            "name": "process_error",
            "raw_log_message": "\"Severity\"=\"40\", \"ErrorKind\"=\"Unset\", \"Time\"=\"1697708013.974044\", \"DateTime\"=\"2023-10-19T09:33:33Z\", \"Type\"=\"PacketLimitExceeded\", \"ID\"=\"0000000000000000\", \"ToPeer\"=\"67.213.220.39:36284:tls\", \"Length\"=\"129276072\", \"ThreadID\"=\"12729161780852052616\", \"Backtrace\"=\"addr2line -e fdbserver.debug -p -C -f -i 0x366fadc 0x366e6f0 0x366eaee 0x3474c8e 0x347ed25 0xb924a8 0xba0603 0x10a5ca0 0x10a67e5 0x10a72d0 0x11ad550 0x360a35e 0xa574d9 0x7f092816b083\", \"Machine\"=\"<IP>:4560\", \"LogGroup\"=\"default\", \"Roles\"=\"CP\"",
            "time": 1697710000,
            "type": "PacketLimitExceeded"
          }
        ],

sudo systemctl status foundationdb.service doesn’t return any errors

Per server amount of server:

Region 1 (3 machines on each the following):

  • 23 storage
  • 3 stateless
  • 23 transaction

Region 2 (3 machines on each the following):

  • 15 storage
  • 3 stateless
  • 15 transaction

Few questions:

Did you change this knob on the client through a network option? (If so: I would strongly recommend against doing this).
Given how you asked your question I assume you didn’t change this knob on the server? It is possible to do, but you probably don’t want to because this could cause large latency increases which can cause all kind of problems.
Do you have client logs enabled? If so: do you see the error there as well? Or do you see LargePacketSent messages (which would be of Severity 30).
Did you somehow change the max transaction size on the client?

It’s hard to tell what exactly is going on here. There should be safeguards against this error. Generally we don’t allow packets to grow beyond a threshold because that can cause bad latency spikes and even priority inversion problems. But without more context it’s hard to tell what’s going on here. It could be a configuration bug or it could be an FDB bug…

1 Like

Answering the questions:

  • No, I haven’t changed the knob on the client
  • I don’t have client logs at the moment, will retrieve them and reply one more time. Generally speaking, docker containers get “transaction cancelled” which may be because of context timeout.
  • I couldn’t find anything which would point that we changed default transaction size. When batching, we have our own default limitation of 30k bytes (which way below 10m bytes limitation from documentation).

Yes 30KB transactions shouldn’t be a problem at all.

The role that is running on the process reporting the error is a commit proxy. So the most likely (though not only) explanation would be that the proxy batches more transactions that it can send to the remote (probably a tlog).

What is running on \"Machine\"=\"<IP>:4560\" that is being logged above? It also would be helpful if you could download the debug package and run the addr2line command that you can find in the logs and post it here.

As a general comment: You’re running a very large number of transaction processes. While this shouldn’t cause the kind of problem you’re seeing, it might still not be optimal. For a cluster this size you probably want to run something closer to 6 tlogs, maybe even 3 (it depends a bit on the workload). We usually start with a small number of tlogs and add more as their CPUs start to get close to 100% or we see high commit latencies.
Are these tlogs even being recruited (the quick and dirty way to see this is to look for CPU activity, the more precise way of checking is to look at the Roles field in the trace files)?

1 Like

Below you can find retrieved logs for failing commit_proxy. I have checked for others failing processes, indeed error occurs only when role “commit proxy” is selected for stateless process.

What I have done so far, reduced number of transaction process as recommended, leaving one per each server (result three in each region). Tried increasing number of stateless processses which resulted in nothing.

I haven’t retrieved client logs yet, but I have played around with the load. When turned off all containers (no load, minor reads), after restarting foundationdb the problem is gone. When I turn on read-only containers with do minor writes, errros appear immediately.

./trace.<IP>.4561.1697724267.mpSk9X.2.135.xml:17993:<Event Severity="40" ErrorKind="Unset" Time="1698034854.330258" DateTime="2023-10-23T04:20:54Z" Type="PacketLimitExceeded" ID="0000000000000000" ToPeer="<IP>:60804:tls" Length="139867048" ThreadID="800837820210536801" Backtrace="addr2line -e fdbserver.debug -p -C -f -i 0x366fadc 0x366e6f0 0x366eaee 0x3474c8e 0x347ed25 0xb924a8 0xba0603 0x10a5ca0 0x10a67e5 0x10a72d0 0x11ad550 0x360a35e 0xa574d9 0x7f49aa6ae083" Machine="<IP>:4561" LogGroup="default" Roles="CP" />

addr2line -e fdbserver.debug.x86_64 -p -C -f -i 0x366fadc 0x366e6f0 0x366eaee 0x3474c8e 0x347ed25 0xb924a8 0xba0603 0x10a5ca0 0x10a67e5 0x10a72d0 0x11ad550 0x360a35e 0xa574d9 0x7f49aa6ae083

std::string::_M_rep() const at /opt/rh/devtoolset-8/root/usr/include/c++/8/bits/basic_string.h:3322
 (inlined by) std::basic_string<char, std::char_traits<char>, std::allocator<char> >::basic_string(std::string const&) at /opt/rh/devtoolset-8/root/usr/include/c++/8/bits/basic_string.tcc:613
 (inlined by) std::basic_string<char, std::char_traits<char>, std::allocator<char> > std::operator+<char, std::char_traits<char>, std::allocator<char> >(std::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, char const*) at /opt/rh/devtoolset-8/root/usr/include/c++/8/bits/basic_string.h:5967
 (inlined by) BaseTraceEvent::backtrace(std::string const&) at /home/foundationdb_ci/src/oOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOo/foundationdb/flow/Trace.cpp:1216
std::string::_M_rep() const at /opt/rh/devtoolset-8/root/usr/include/c++/8/bits/basic_string.h:3322
 (inlined by) std::basic_string<char, std::char_traits<char>, std::allocator<char> >::~basic_string() at /opt/rh/devtoolset-8/root/usr/include/c++/8/bits/basic_string.h:3640
 (inlined by) BaseTraceEvent::log() at /home/foundationdb_ci/src/oOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOo/foundationdb/flow/Trace.cpp:1235
BaseTraceEvent::~BaseTraceEvent() at /home/foundationdb_ci/src/oOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOo/foundationdb/flow/Trace.cpp:1275
sendPacket(TransportData*, Reference<Peer>, ISerializeSource const&, Endpoint const&, bool) at /home/foundationdb_ci/src/oOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOo/foundationdb/flow/Trace.h:543 (discriminator 2)
Reference<Peer>::~Reference() at /home/foundationdb_ci/src/oOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOo/foundationdb/flow/FastRef.h:124
 (inlined by) FlowTransport::sendUnreliable(ISerializeSource const&, Endpoint const&, bool) at /home/foundationdb_ci/src/oOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOo/foundationdb/fdbrpc/FlowTransport.actor.cpp:1770
Reference<Peer>::~Reference() at /home/foundationdb_ci/src/oOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOo/foundationdb/flow/FastRef.h:124
 (inlined by) a_body1cont2 at /home/foundationdb_ci/src/oOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOo/foundationdb/fdbrpc/networksender.actor.h:38
ActorCallback<(anonymous namespace)::NetworkSenderActor<GetKeyServerLocationsReply>, 0, GetKeyServerLocationsReply>::fire(GetKeyServerLocationsReply const&) at /home/foundationdb_ci/foundationdb_build_output/dbdbdbdbdbdbdbdbdbdbdbdbdbdbdbdb/fdbrpc/networksender.actor.g.h:178
 (inlined by) fire at /home/foundationdb_ci/src/oOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOo/foundationdb/flow/flow.h:1318
void SAV<GetKeyServerLocationsReply>::send<GetKeyServerLocationsReply&>(GetKeyServerLocationsReply&) at /home/foundationdb_ci/src/oOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOo/foundationdb/flow/flow.h:659
 (inlined by) void ReplyPromise<GetKeyServerLocationsReply>::send<GetKeyServerLocationsReply&>(GetKeyServerLocationsReply&) const at /home/foundationdb_ci/src/oOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOo/foundationdb/fdbrpc/fdbrpc.h:120
 (inlined by) a_body1cont3 at /home/foundationdb_ci/src/oOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOo/foundationdb/fdbserver/CommitProxyServer.actor.cpp:1796
 (inlined by) a_body1cont2break1 at /home/foundationdb_ci/foundationdb_build_output/dbdbdbdbdbdbdbdbdbdbdbdbdbdbdbdb/fdbserver/CommitProxyServer.actor.g.cpp:7954
 (inlined by) a_body1cont2loopBody1 at /home/foundationdb_ci/foundationdb_build_output/dbdbdbdbdbdbdbdbdbdbdbdbdbdbdbdb/fdbserver/CommitProxyServer.actor.g.cpp:7897
(anonymous namespace)::DoKeyServerLocationRequestActorState<(anonymous namespace)::DoKeyServerLocationRequestActor>::a_body1cont2(Void const&, int) [clone .isra.4754] [clone .constprop.25679] at /home/foundationdb_ci/foundationdb_build_output/dbdbdbdbdbdbdbdbdbdbdbdbdbdbdbdb/fdbserver/CommitProxyServer.actor.g.cpp:7887
 (inlined by) a_body1cont2 at /home/foundationdb_ci/foundationdb_build_output/dbdbdbdbdbdbdbdbdbdbdbdbdbdbdbdb/fdbserver/CommitProxyServer.actor.g.cpp:7676
ActorCallback<(anonymous namespace)::DoKeyServerLocationRequestActor, 1, Void>::fire(Void const&) at /home/foundationdb_ci/foundationdb_build_output/dbdbdbdbdbdbdbdbdbdbdbdbdbdbdbdb/fdbserver/CommitProxyServer.actor.g.cpp:7725
 (inlined by) fire at /home/foundationdb_ci/src/oOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOo/foundationdb/flow/flow.h:1318
void SAV<Void>::send<Void>(Void&&) at /home/foundationdb_ci/src/oOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOo/foundationdb/flow/flow.h:659
Promise<Void>::~Promise() at /home/foundationdb_ci/src/oOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOo/foundationdb/flow/flow.h:927
 (inlined by) N2::PromiseTask::~PromiseTask() at /home/foundationdb_ci/src/oOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOo/foundationdb/flow/Net2.actor.cpp:1206
 (inlined by) N2::PromiseTask::operator()() at /home/foundationdb_ci/src/oOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOo/foundationdb/flow/Net2.actor.cpp:1213
 (inlined by) N2::Net2::run() at /home/foundationdb_ci/src/oOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOo/foundationdb/flow/Net2.actor.cpp:1559
main at /home/foundationdb_ci/src/oOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOo/foundationdb/fdbserver/fdbserver.actor.cpp:2141 (discriminator 5)

And these are client trace.

Thank you for advises on retrieving logs, learned a lot for myself. Any help on the issue would be greatly appreciated:

Severity 40 (can’t provide addr2line unfortunately, as I couldn’t find libfdb_c.so-debug):

./trace.172.18.0.5.1.1698108180.Y6Vm2T.0.1.xml:2970:<Event Severity="40" ErrorKind="Unset" Time="1698108287.621518" DateTime="2023-10-24T00:44:47Z" Type="PacketLimitExceeded" ID="0000000000000000" FromPeer="<IP>:4500:tls" Length="140500720" ThreadID="11821446256232085457" Backtrace="addr2line -e libfdb_c.so-debug -p -C -f -i 0xa5320c 0xa51e20 0xa5221e 0x93a8e9 0x93c627 0x93b17b 0x93cfbd 0x7c74d0 0x9e657e 0x325ef2 0x88fbcf 0x2ddb32 0x2b9089 0xffff801a2748a126" Machine="172.18.0.5:1" LogGroup="default" ClientDescription="primary-7.1.23-11821446256232085457" />

Severity 30 (large packet received, as you have mentioned), also SlowTask, ConnectionClosed (not posting here):

./trace.172.18.0.5.1.1698108180.Y6Vm2T.0.1.xml:1575:<Event Severity="30" Time="1698108273.700053" DateTime="2023-10-24T00:44:33Z" Type="LargePacketReceived" ID="0000000000000000" SuppressedEventCount="0" FromPeer="<IP>:4500:tls" Length="3954352" Token="65037c2cf0098638" ThreadID="11821446256232085457" Machine="172.18.0.5:1" LogGroup="default" ClientDescription="primary-7.1.23-11821446256232085457" />

Severity 30, but from other cluster, with bigger value in “Length” field

./trace.172.19.0.4.1.1698108158.WLW4H8.0.1.xml:20261:<Event Severity="30" Time="1698109248.706203" DateTime="2023-10-24T01:00:48Z" Type="LargePacketReceived" ID="0000000000000000" SuppressedEventCount="0" FromPeer="<IP>:4562:tls" Length="95547696" Token="03ebc63ccf8e394c" ThreadID="14986448794029255822" Machine="172.19.0.4:1" LogGroup="default" ClientDescription="primary-7.1.23-14986448794029255822" />

It seems GetKeyServerLocationsReply is the culprit – which is very strange… Do you use long DNS names?

But generally, this doesn’t make much sense to me. Maybe @jzhou has seen something like this before?

This is indeed weird. GetKeyServerLocationsReply should be almost constant size, containing storage server information for the given key. We’ve never seen this before.

As for the server names we do not use DNS names, just IPs.

Is it possible that the cause is multiregion set up? Because it affects only our multiregion set ups (two out of three). The non multiregion cluster is not affected.

UPD: I have just realised, that we are running different FoundationDB version (6.3.23) and engine (ssd) on not affected cluster. In comparison to 7.1.23 with redwood on clusters with error.

I don’t think this is redwood related. Redwood isn’t involved in the offending call. Multiregion and FDB versions are probably more likely to make a difference (though both don’t make a lot of sense to me). As Jingyu said, the GetKeyServerLocationsReply object should be of almost constant size: it contains two keys and 2-4 storage server interfaces. I also don’t think that this structure did change in the last few versions in any meaningful way.

So basically I believe you would need someone with a lot of debugging experience actually looking at the system. My best guess is some kind of bug that we never reproduced in testing or production.

@redcuckoo Do you by any chance use fdb_transaction_get_estimated_range_size_bytes somewhere in the client code? This would be a possibility where we could create an unbounded request. For example if you do this for the range ['', \xff) this could, if I read the code correctly, cause the issue you’re seeing.

2 Likes

That is actually the answer. Thank you so much @markus.pilman and @jzhou for helping to figure it out. I look at the code and found that we indeed using it in one of our services.

How we utilized it:
We had metrics which would give us a size for each namespace in the DB. And eventually as database grew, the range for some prefixes became too big and this error pop up.

This is how it looks in the code, and that’s not a good idea to do so for big db and a lot of data per namespace:

				nsRange, err := fdb.PrefixRange(tuple.Tuple{int(prefix)}.FDBKey())
				if err != nil {
					return nil, fmt.Errorf("failed make prefix range, %w", err)
				}
				size, err := transaction.Snapshot().GetEstimatedRangeSizeBytes(nsRange).Get()

Where prefix is namespace and GetEstimatedRangeSizeBytes is actually a function from go bindings which utilizes fdb_transaction_get_estimated_range_size_bytesAlso as I

1 Like

I am glad we found the problem. However, I would consider this to be a FDB bug :frowning: Basically you’re using an API which, according to documentation, should be safe to use but can cause server crashes.

We’ll need to think about how to fix this. Easiest would be to return an error to the client when this happens, but chopping up a range into smaller ranges might not be a trivial task for an application. The client should be able to do this, but it might make this operation more expensive than a user might anticipate.

1 Like

Is this scenario driven by a large dataset AND an unbounded range estimation or is an estimation on a sufficiently large bounded range (e.g. a directory) enough to trigger this? What is a best guess for the size when this phenomenon starts manifesting?

It depends on the average shard size and the size of your cluster. So this bug will trigger if the range you’re trying to get an estimate for is split across too many fdb storage processes.

So this situation is really not great as I don’t believe the client can easily do something here:

  1. If the range you’re trying to get an estimate is hot, it is more likely that this bug will trigger (since we’ll have smaller shards).
  2. If the range you pass is very large this bug is more likely to trigger.
  3. If the cluster serving this range is very large (has many storage server), this bug is more likely to trigger.

So I don’t think you can easily estimate the above and make a good call. I think the best is to be very conservative and use this feature only for ranges you expect to be small (maybe <10GB but not sure). Even if you don’t see crashes, you might see higher tail latencies due to this.

Hopefully we’ll have a patch out soon

1 Like