I have two storage processes on one physical server continually restarting themselves with no errors seen.
I have a two-node dev cluster. I noticed that the fdb directory on one machine (laptop A) was not growing in size while loading data.
Checking grafana where I monitor fdb_is_storage
I saw that the number of storage servers fluctuates between 2 and 4 about every 30 seconds. (The other two storage servers are on another machine.)
Checking server A’s data directories I can see the sqlite files for the storage servers being deleted and recreated.
There are no Sev 40s in the log files. And the only 30s are for
"Type": "MasterRecoveryDuration",
"Type": "TraceEventThrottle_TransactionMetrics",
I’m perplexed as to how to start troubleshooting this.
Configuration:
Redundancy mode - single
Storage engine - ssd-2
Coordinators - 1
Usable Regions - 1
Cluster:
FoundationDB processes - 11
Zones - 2
Machines - 2
Memory availability - 1.3 GB per process on machine with least available
>>>>> (WARNING: 4.0 GB recommended) <<<<<
Fault Tolerance - 0 machines
Server time - 01/19/23 13:07:08
Data:
Replication health - Healthy
Moving data - 0.000 GB
Sum of key-value sizes - 145.083 GB
Disk space used - 170.650 GB
Operating space:
Storage server - 499.6 GB free on most full server
Log server - 374.6 GB free on most full server
Workload:
Read rate - 49 Hz
Write rate - 571 Hz
Transactions started - 90 Hz
Transactions committed - 14 Hz
Conflict rate - 0 Hz
Backup and DR:
Running backups - 0
Running DRs - 0
Process performance details:
100.99.239.86:9994 ( 1% cpu; 22% machine; 0.000 Gbps; 3% disk IO;395.2 GB / 1.3 GB RAM )
100.99.239.86:9995 ( 3% cpu; 22% machine; 0.000 Gbps; 4% disk IO;391.2 GB / 1.3 GB RAM )
100.99.239.86:9996 ( 1% cpu; 22% machine; 0.000 Gbps; 3% disk IO;395.2 GB / 1.3 GB RAM )
100.99.239.86:9997 ( 2% cpu; 22% machine; 0.000 Gbps; 3% disk IO;391.1 GB / 1.3 GB RAM )
100.99.239.86:9998 ( 14% cpu; 22% machine; 0.000 Gbps; 3% disk IO;391.0 GB / 1.3 GB RAM )
100.99.239.86:9999 ( 1% cpu; 22% machine; 0.000 Gbps; 3% disk IO;390.6 GB / 1.3 GB RAM )
100.105.51.93:9994 ( 3% cpu; 21% machine; 0.041 Gbps; 0% disk IO; 7.3 GB / 1.8 GB RAM )
100.105.51.93:9995 ( 3% cpu; 21% machine; 0.041 Gbps; 0% disk IO; 0.4 GB / 1.8 GB RAM )
100.105.51.93:9996 ( 3% cpu; 21% machine; 0.041 Gbps; 0% disk IO; 6.3 GB / 1.8 GB RAM )
100.105.51.93:9997 ( 3% cpu; 21% machine; 0.041 Gbps; 0% disk IO; 0.2 GB / 1.8 GB RAM )
100.105.51.93:9998 ( 29% cpu; 21% machine; 0.041 Gbps; 0% disk IO; 0.2 GB / 1.8 GB RAM )
Coordination servers:
100.99.239.86:9999 (reachable)
Client time: 01/19/23 13:07:07