What can cause proxy commit batch memory to be exceeded?

In one of our 10 node cluster running foundation db 6.2.20 we are observing that database is not available. In the proxy logs I am seeing lot of errors like below

trace.10.196.78.225.4501.1597232984.Qwemy7.1.72.xml:<Event Severity="30" Time="1597826631.057304" Type="ProxyCommitBatchMemoryThresholdExceeded" ID="0000000000000000" SuppressedEventCount="5093" MemBytesCount="429482900" MemLimit="429496729" Machine="10.196.78.225:4501" LogGroup="default" Roles="CD,MP" />

I have placed all the logs here.

foundationdb.conf looks like below

[fdbmonitor]
user = ubuntu
group = ubuntu

[general]
restart_delay = 60
cluster_file = /etc/foundationdb/fdb.cluster

[fdbserver]
command = /usr/bin/fdbserver
public_address = auto:$ID
listen_address = public
datadir = /var/lib/foundationdb/data/$ID
logdir = /var/log/foundationdb

[fdbserver.4500]

[fdbserver.4501]
class = stateless

[backup_agent]
command = /usr/lib/foundationdb/backup_agent/backup_agent
logdir = /var/log/foundationdb

[backup_agent.1]

fdb status details output is like below

ubuntu@platform1:~$ fdbcli
Using cluster file `/etc/foundationdb/fdb.cluster'.

The database is unavailable; type `status' for more information.

Welcome to the fdbcli. For help, type `help'.
fdb> status details

WARNING: Long delay (Ctrl-C to interrupt)

Using cluster file `/etc/foundationdb/fdb.cluster'.

Unable to commit after 5 seconds.

Configuration:
  Redundancy mode        - double
  Storage engine         - ssd-2
  Coordinators           - 3
  Desired Proxies        - 1
  Desired Logs           - 2

Cluster:
  FoundationDB processes - 20
  Zones                  - 10
  Machines               - 10
  Memory availability    - 10.8 GB per process on machine with least available
  Retransmissions rate   - 1 Hz
  Fault Tolerance        - 1 machine
  Server time            - 08/19/20 09:37:27

Data:
  Replication health     - Healthy (Repartitioning)
  Moving data            - 0.161 GB
  Sum of key-value sizes - 349.031 GB
  Disk space used        - 842.336 GB

Operating space:
  Storage server         - 84.4 GB free on most full server
  Log server             - 91.2 GB free on most full server

Workload:
  Read rate              - 842 Hz
  Write rate             - 0 Hz
  Transactions started   - 668 Hz
  Transactions committed - 0 Hz
  Conflict rate          - 87 Hz

Backup and DR:
  Running backups        - 0
  Running DRs            - 0

Process performance details:
  10.196.78.151:4500     (  2% cpu;  9% machine; 0.009 Gbps;  0% disk IO; 5.1 GB / 10.8 GB RAM  )
  10.196.78.151:4501     (  0% cpu;  9% machine; 0.009 Gbps;  0% disk IO; 0.1 GB / 10.8 GB RAM  )
  10.196.78.152:4500     (  7% cpu;  4% machine; 0.008 Gbps;  6% disk IO; 4.2 GB / 11.4 GB RAM  )
  10.196.78.152:4501     (  2% cpu;  4% machine; 0.008 Gbps;  6% disk IO; 0.1 GB / 11.4 GB RAM  )
  10.196.78.153:4500     (  3% cpu; 10% machine; 0.014 Gbps;  0% disk IO; 3.7 GB / 11.8 GB RAM  )
  10.196.78.153:4501     (  2% cpu; 10% machine; 0.014 Gbps;  0% disk IO; 0.1 GB / 11.8 GB RAM  )
  10.196.78.154:4500     (  2% cpu;  5% machine; 0.012 Gbps;  0% disk IO; 3.6 GB / 10.9 GB RAM  )
  10.196.78.154:4501     (  1% cpu;  5% machine; 0.012 Gbps;  0% disk IO; 0.3 GB / 10.9 GB RAM  )
  10.196.78.155:4500     (  2% cpu;  3% machine; 0.011 Gbps;  0% disk IO; 5.0 GB / 11.0 GB RAM  )
  10.196.78.155:4501     (  0% cpu;  3% machine; 0.011 Gbps;  0% disk IO; 0.1 GB / 11.0 GB RAM  )
  10.196.78.160:4500     (  3% cpu;  2% machine; 0.010 Gbps;  0% disk IO; 4.6 GB / 12.0 GB RAM  )
  10.196.78.160:4501     (  1% cpu;  2% machine; 0.010 Gbps;  0% disk IO; 0.2 GB / 12.0 GB RAM  )
  10.196.78.161:4500     (  2% cpu;  5% machine; 0.006 Gbps;  1% disk IO; 4.6 GB / 12.0 GB RAM  )
  10.196.78.161:4501     (  0% cpu;  5% machine; 0.006 Gbps;  1% disk IO; 0.1 GB / 12.0 GB RAM  )
  10.196.78.162:4500     (  3% cpu;  5% machine; 0.014 Gbps;  0% disk IO; 3.7 GB / 11.0 GB RAM  )
  10.196.78.162:4501     (  1% cpu;  5% machine; 0.014 Gbps;  0% disk IO; 0.2 GB / 11.0 GB RAM  )
  10.196.78.225:4500     (  8% cpu;  6% machine; 0.018 Gbps;  3% disk IO; 4.9 GB / 11.7 GB RAM  )
  10.196.78.225:4501     ( 14% cpu;  6% machine; 0.018 Gbps;  3% disk IO; 2.6 GB / 11.7 GB RAM  )
  10.196.78.226:4500     (  2% cpu;  2% machine; 0.003 Gbps;  0% disk IO; 3.3 GB / 11.7 GB RAM  )
  10.196.78.226:4501     (  1% cpu;  2% machine; 0.003 Gbps;  0% disk IO; 0.3 GB / 11.7 GB RAM  )

Coordination servers:
  10.196.78.154:4501  (reachable)
  10.196.78.225:4501  (reachable)
  10.196.78.226:4501  (reachable)

Client time: 08/19/20 09:37:18

WARNING: A single process is both a transaction log and a storage server.
  For best performance use dedicated disks for the transaction logs by setting process classes.

fdb>
  1. From the logs I am observing that the proxy process is taking about ~4GB mem. What could be causing proxy to take this much memory?

  2. Logs are saying that proxy is taking about 4 GB memory but linux is showing it to take ~2.6GB. Is this expected?

OS - Ubuntu 16.04.6

@alexmiller @ajbeamon - Can you suggest something that could be causing this?

Just checking, could this be in some way related to a recent change made in this area here? This PR was merged in 6.2.19; I believe we have not encountered this error on lower fdb versions.

After restarting the foundation db processes in all the nodes. For last 6 days I have not observed proxy taking much memory (Under 200 MB now) with almost the same load.

I did search the forums and did not find any discussion around ProxyCommitBatchMemoryThresholdExceeded . Given that each transaction has limit of < 100KB, can someone suggest what all things can cause memory pressure in proxy?

status json of current cluster state.