we are running a 6 node cluster for foundation db and we are getting foundation db process degrade alert intermittently and that is getting resolved automatically after 24 hours on average . Servers are not loaded and not much activity exists .
We are using GitHub - tigrisdata/fdb-exporter: Expose fdb metrics in Prometheus format for alerts .
Foundation db version : 7.3.43-1
Configuration:
Redundancy mode - double
Storage engine - ssd-2
Log engine - ssd-2
Encryption at-rest - disabled
Coordinators - 3
Desired Commit Proxies - 3
Desired GRV Proxies - 1
Desired Resolvers - 1
Desired Logs - 3
Usable Regions - 1
Cluster:
FoundationDB processes - 22
Zones - 6
Machines - 6
Memory availability - 4.0 GB per process on machine with least available
Fault Tolerance - 1 machines
Server time - 06/03/25 07:18:33
Data:
Replication health - Healthy
Moving data - 0.000 GB
Sum of key-value sizes - 49 MB
Disk space used - 984 MB
Operating space:
Storage server - 299.0 GB free on most full server
Log server - 61.3 GB free on most full server
Workload:
Read rate - 30 Hz
Write rate - 1 Hz
Transactions started - 13 Hz
Transactions committed - 1 Hz
Conflict rate - 0 Hz
Backup and DR:
Running backups - 0
Running DRs - 0
Process performance details:
xx.xx.xx.x1:4500 ( 0% cpu; 3% machine; 0.001 Gbps; 0% disk IO; 0.1 GB / 4.0 GB RAM )
xx.xx.xx.x1:4501 ( 0% cpu; 3% machine; 0.001 Gbps; 0% disk IO; 0.1 GB / 4.0 GB RAM )
xx.xx.xx.x1:4502 ( 1% cpu; 3% machine; 0.001 Gbps; 0% disk IO; 0.1 GB / 4.0 GB RAM )
xx.xx.xx.x1:4503 ( 0% cpu; 3% machine; 0.001 Gbps; 0% disk IO; 0.1 GB / 4.0 GB RAM )
xx.xx.xx.x2:4500 ( 1% cpu; 2% machine; 0.001 Gbps; 0% disk IO; 0.5 GB / 4.0 GB RAM )
xx.xx.xx.x2:4501 ( 1% cpu; 2% machine; 0.001 Gbps; 0% disk IO; 0.2 GB / 4.0 GB RAM )
xx.xx.xx.x2:4502 ( 1% cpu; 2% machine; 0.001 Gbps; 0% disk IO; 0.1 GB / 4.0 GB RAM )
xx.xx.xx.x2:4503 ( 1% cpu; 2% machine; 0.001 Gbps; 0% disk IO; 0.1 GB / 4.0 GB RAM )
xx.xx.xx.x3:4500 ( 0% cpu; 2% machine; 0.002 Gbps; 0% disk IO; 0.1 GB / 4.0 GB RAM )
xx.xx.xx.x3:4501 ( 1% cpu; 2% machine; 0.002 Gbps; 1% disk IO; 0.1 GB / 4.0 GB RAM )
xx.xx.xx.x3:4502 ( 1% cpu; 2% machine; 0.002 Gbps; 0% disk IO; 0.1 GB / 4.0 GB RAM )
xx.xx.xx.x3:4503 ( 0% cpu; 2% machine; 0.002 Gbps; 0% disk IO; 0.1 GB / 4.0 GB RAM )
xx.xx.xx.x4:4500 ( 0% cpu; 1% machine; 0.001 Gbps; 0% disk IO; 0.1 GB / 4.0 GB RAM )
xx.xx.xx.x4:4501 ( 0% cpu; 1% machine; 0.001 Gbps; 1% disk IO; 0.1 GB / 4.0 GB RAM )
xx.xx.xx.x4:4502 ( 1% cpu; 1% machine; 0.001 Gbps; 0% disk IO; 0.1 GB / 4.0 GB RAM )
xx.xx.xx.x4:4503 ( 1% cpu; 1% machine; 0.001 Gbps; 0% disk IO; 0.1 GB / 4.0 GB RAM )
xx.xx.xx.x5:4500 ( 0% cpu; 1% machine; 0.001 Gbps; 0% disk IO; 0.1 GB / 4.0 GB RAM )
xx.xx.xx.x5:4501 ( 0% cpu; 1% machine; 0.001 Gbps; 0% disk IO; 0.2 GB / 4.0 GB RAM )
xx.xx.xx.x5:4502 ( 0% cpu; 1% machine; 0.001 Gbps; 0% disk IO; 0.1 GB / 4.0 GB RAM )
xx.xx.xx.x6:4500 ( 0% cpu; 1% machine; 0.000 Gbps; 0% disk IO; 0.1 GB / 4.0 GB RAM )
xx.xx.xx.x6:4501 ( 0% cpu; 1% machine; 0.000 Gbps; 0% disk IO; 0.5 GB / 4.0 GB RAM )
xx.xx.xx.x6:4502 ( 0% cpu; 1% machine; 0.000 Gbps; 0% disk IO; 0.1 GB / 4.0 GB RAM )
Coordination servers:
xx.xx.xx.x1:4500 (reachable)
xx.xx.xx.x3:4500 (reachable)
xx.xx.xx.x4:4500 (reachable)