In one of our 10 node cluster running foundation db 6.2.20 we are observing that database is not available. In the proxy logs I am seeing lot of errors like below
trace.10.196.78.225.4501.1597232984.Qwemy7.1.72.xml:<Event Severity="30" Time="1597826631.057304" Type="ProxyCommitBatchMemoryThresholdExceeded" ID="0000000000000000" SuppressedEventCount="5093" MemBytesCount="429482900" MemLimit="429496729" Machine="10.196.78.225:4501" LogGroup="default" Roles="CD,MP" />
I have placed all the logs here.
foundationdb.conf looks like below
[fdbmonitor]
user = ubuntu
group = ubuntu
[general]
restart_delay = 60
cluster_file = /etc/foundationdb/fdb.cluster
[fdbserver]
command = /usr/bin/fdbserver
public_address = auto:$ID
listen_address = public
datadir = /var/lib/foundationdb/data/$ID
logdir = /var/log/foundationdb
[fdbserver.4500]
[fdbserver.4501]
class = stateless
[backup_agent]
command = /usr/lib/foundationdb/backup_agent/backup_agent
logdir = /var/log/foundationdb
[backup_agent.1]
fdb status details output is like below
ubuntu@platform1:~$ fdbcli
Using cluster file `/etc/foundationdb/fdb.cluster'.
The database is unavailable; type `status' for more information.
Welcome to the fdbcli. For help, type `help'.
fdb> status details
WARNING: Long delay (Ctrl-C to interrupt)
Using cluster file `/etc/foundationdb/fdb.cluster'.
Unable to commit after 5 seconds.
Configuration:
Redundancy mode - double
Storage engine - ssd-2
Coordinators - 3
Desired Proxies - 1
Desired Logs - 2
Cluster:
FoundationDB processes - 20
Zones - 10
Machines - 10
Memory availability - 10.8 GB per process on machine with least available
Retransmissions rate - 1 Hz
Fault Tolerance - 1 machine
Server time - 08/19/20 09:37:27
Data:
Replication health - Healthy (Repartitioning)
Moving data - 0.161 GB
Sum of key-value sizes - 349.031 GB
Disk space used - 842.336 GB
Operating space:
Storage server - 84.4 GB free on most full server
Log server - 91.2 GB free on most full server
Workload:
Read rate - 842 Hz
Write rate - 0 Hz
Transactions started - 668 Hz
Transactions committed - 0 Hz
Conflict rate - 87 Hz
Backup and DR:
Running backups - 0
Running DRs - 0
Process performance details:
10.196.78.151:4500 ( 2% cpu; 9% machine; 0.009 Gbps; 0% disk IO; 5.1 GB / 10.8 GB RAM )
10.196.78.151:4501 ( 0% cpu; 9% machine; 0.009 Gbps; 0% disk IO; 0.1 GB / 10.8 GB RAM )
10.196.78.152:4500 ( 7% cpu; 4% machine; 0.008 Gbps; 6% disk IO; 4.2 GB / 11.4 GB RAM )
10.196.78.152:4501 ( 2% cpu; 4% machine; 0.008 Gbps; 6% disk IO; 0.1 GB / 11.4 GB RAM )
10.196.78.153:4500 ( 3% cpu; 10% machine; 0.014 Gbps; 0% disk IO; 3.7 GB / 11.8 GB RAM )
10.196.78.153:4501 ( 2% cpu; 10% machine; 0.014 Gbps; 0% disk IO; 0.1 GB / 11.8 GB RAM )
10.196.78.154:4500 ( 2% cpu; 5% machine; 0.012 Gbps; 0% disk IO; 3.6 GB / 10.9 GB RAM )
10.196.78.154:4501 ( 1% cpu; 5% machine; 0.012 Gbps; 0% disk IO; 0.3 GB / 10.9 GB RAM )
10.196.78.155:4500 ( 2% cpu; 3% machine; 0.011 Gbps; 0% disk IO; 5.0 GB / 11.0 GB RAM )
10.196.78.155:4501 ( 0% cpu; 3% machine; 0.011 Gbps; 0% disk IO; 0.1 GB / 11.0 GB RAM )
10.196.78.160:4500 ( 3% cpu; 2% machine; 0.010 Gbps; 0% disk IO; 4.6 GB / 12.0 GB RAM )
10.196.78.160:4501 ( 1% cpu; 2% machine; 0.010 Gbps; 0% disk IO; 0.2 GB / 12.0 GB RAM )
10.196.78.161:4500 ( 2% cpu; 5% machine; 0.006 Gbps; 1% disk IO; 4.6 GB / 12.0 GB RAM )
10.196.78.161:4501 ( 0% cpu; 5% machine; 0.006 Gbps; 1% disk IO; 0.1 GB / 12.0 GB RAM )
10.196.78.162:4500 ( 3% cpu; 5% machine; 0.014 Gbps; 0% disk IO; 3.7 GB / 11.0 GB RAM )
10.196.78.162:4501 ( 1% cpu; 5% machine; 0.014 Gbps; 0% disk IO; 0.2 GB / 11.0 GB RAM )
10.196.78.225:4500 ( 8% cpu; 6% machine; 0.018 Gbps; 3% disk IO; 4.9 GB / 11.7 GB RAM )
10.196.78.225:4501 ( 14% cpu; 6% machine; 0.018 Gbps; 3% disk IO; 2.6 GB / 11.7 GB RAM )
10.196.78.226:4500 ( 2% cpu; 2% machine; 0.003 Gbps; 0% disk IO; 3.3 GB / 11.7 GB RAM )
10.196.78.226:4501 ( 1% cpu; 2% machine; 0.003 Gbps; 0% disk IO; 0.3 GB / 11.7 GB RAM )
Coordination servers:
10.196.78.154:4501 (reachable)
10.196.78.225:4501 (reachable)
10.196.78.226:4501 (reachable)
Client time: 08/19/20 09:37:18
WARNING: A single process is both a transaction log and a storage server.
For best performance use dedicated disks for the transaction logs by setting process classes.
fdb>
-
From the logs I am observing that the proxy process is taking about ~4GB mem. What could be causing proxy to take this much memory?
-
Logs are saying that proxy is taking about 4 GB memory but linux is showing it to take ~2.6GB. Is this expected?
OS - Ubuntu 16.04.6
@alexmiller @ajbeamon - Can you suggest something that could be causing this?