Hi,
I’m evaluating FoundationDB for an upcoming project, and I’m seeing way higher latencies than I would expect on writes.
I’m running a 5 node cluster in a local Kubernetes cluster using Kind. The cluster is using a simple statefulset and no process classes specified. I don’t see any significant CPU usage on any of the processes. I am using GitHub - tigrisdata/fdb-exporter: Expose fdb metrics in Prometheus format to export server side metrics.
I have set up a double memory database, and have created a very simple client app in Go that just generates 2 UUIDs and writes them as key-value. After the write, I read back the key and discard the result.
For reads, I’m getting 1-2ms P50 latency, but for writes, P50 is ~15ms, which is quite a lot higher than I would expect.
I have attempted to look at trace files, and have found some indication that LogPush
is taking a long time, but debugging that further or resolving it is beyond me
Logs:
<Event Severity="10" Time="1711640579.008349" DateTime="2024-03-28T15:42:59Z" Type="CommitAttachID" ID="45da311d97d25f27" To="1c6dfe01d0e3124f" Machine="10.244.0.17:4500" LogGroup="default" Roles="CC,CD,CP,SS" />
<Event Severity="10" Time="1711640579.008349" DateTime="2024-03-28T15:42:59Z" Type="CommitDebug" ID="1c6dfe01d0e3124f" Location="CommitProxyServer.commitBatch.Before" Machine="10.244.0.17:4500" LogGroup="default" Roles="CC,CD,CP,SS" />
<Event Severity="10" Time="1711640579.008349" DateTime="2024-03-28T15:42:59Z" Type="CommitDebug" ID="1c6dfe01d0e3124f" Location="CommitProxyServer.commitBatch.GettingCommitVersion" Machine="10.244.0.17:4500" LogGroup="default" Roles="CC,CD,CP,SS" />
<Event Severity="10" Time="1711640579.008816" DateTime="2024-03-28T15:42:59Z" Type="CommitDebug" ID="1c6dfe01d0e3124f" Location="CommitProxyServer.commitBatch.GotCommitVersion" Machine="10.244.0.17:4500" LogGroup="default" Roles="CC,CD,CP,SS" />
<Event Severity="10" Time="1711640579.009514" DateTime="2024-03-28T15:42:59Z" Type="CommitDebug" ID="1c6dfe01d0e3124f" Location="CommitProxyServer.commitBatch.AfterResolution" Machine="10.244.0.17:4500" LogGroup="default" Roles="CC,CD,CP,SS" />
<Event Severity="10" Time="1711640579.009514" DateTime="2024-03-28T15:42:59Z" Type="CommitDebug" ID="1c6dfe01d0e3124f" Location="CommitProxyServer.commitBatch.ProcessingMutations" Machine="10.244.0.17:4500" LogGroup="default" Roles="CC,CD,CP,SS" />
<Event Severity="10" Time="1711640579.009514" DateTime="2024-03-28T15:42:59Z" Type="CommitDebug" ID="1c6dfe01d0e3124f" Location="CommitProxyServer.commitBatch.ApplyMetadaEffect" Machine="10.244.0.17:4500" LogGroup="default" Roles="CC,CD,CP,SS" />
<Event Severity="10" Time="1711640579.009514" DateTime="2024-03-28T15:42:59Z" Type="CommitDebug" ID="1c6dfe01d0e3124f" Location="CommitProxyServer.commitBatch.ApplyMetadaEffect" Machine="10.244.0.17:4500" LogGroup="default" Roles="CC,CD,CP,SS" />
<Event Severity="10" Time="1711640579.009514" DateTime="2024-03-28T15:42:59Z" Type="CommitDebug" ID="1c6dfe01d0e3124f" Location="CommitProxyServer.commitBatch.ApplyMetadaToCommittedTxn" Machine="10.244.0.17:4500" LogGroup="default" Roles="CC,CD,CP,SS" />
<Event Severity="10" Time="1711640579.009514" DateTime="2024-03-28T15:42:59Z" Type="CommitDebug" ID="1c6dfe01d0e3124f" Location="CommitProxyServer.commitBatch.AssignMutationToSS" Machine="10.244.0.17:4500" LogGroup="default" Roles="CC,CD,CP,SS" />
<Event Severity="10" Time="1711640579.009514" DateTime="2024-03-28T15:42:59Z" Type="CommitDebug" ID="1c6dfe01d0e3124f" Location="CommitProxyServer.commitBatch.AfterStoreCommits" Machine="10.244.0.17:4500" LogGroup="default" Roles="CC,CD,CP,SS" />
<Event Severity="10" Time="1711640579.017510" DateTime="2024-03-28T15:42:59Z" Type="CommitDebug" ID="1c6dfe01d0e3124f" Location="CommitProxyServer.commitBatch.AfterLogPush" Machine="10.244.0.17:4500" LogGroup="default" Roles="CC,CD,CP,SS" />
<Event Severity="10" Time="1711640579.017958" DateTime="2024-03-28T15:42:59Z" Type="CommitDebug" ID="1c6dfe01d0e3124f" Location="CommitProxyServer.commitBatch.AfterReportRawCommittedVersion" Machine="10.244.0.17:4500" LogGroup="default" Roles="CC,CD,CP,SS" />
Cluster status:
[root@fdb-kubernetes-example-0 /]# fdbcli
Using cluster file `/var/fdb/data/fdb.cluster'.
The database is available.
Welcome to the fdbcli. For help, type `help'.
fdb> status
Using cluster file `/var/fdb/data/fdb.cluster'.
Configuration:
Redundancy mode - double
Storage engine - memory-2
Coordinators - 3
Usable Regions - 1
Cluster:
FoundationDB processes - 5
Zones - 5
Machines - 5
Memory availability - 5.3 GB per process on machine with least available
Retransmissions rate - 2 Hz
Fault Tolerance - 1 zones
Server time - 03/28/24 15:46:46
Data:
Replication health - Healthy
Moving data - 0.000 GB
Sum of key-value sizes - 62 MB
Disk space used - 829 MB
Operating space:
Storage server - 0.9 GB free on most full server
Log server - 398.2 GB free on most full server
Workload:
Read rate - 161 Hz
Write rate - 155 Hz
Transactions started - 312 Hz
Transactions committed - 155 Hz
Conflict rate - 0 Hz
Backup and DR:
Running backups - 0
Running DRs - 0
Client time: 03/28/24 15:46:46
Latency statistics from status json
:
"commit_latency_statistics" : {
"count" : 2756,
"max" : 0.089341599999999993,
"mean" : 0.010652,
"median" : 0.010372900000000001,
"min" : 0.0037507999999999999,
"p25" : 0.009155990000000001,
"p90" : 0.0132411,
"p95" : 0.014465800000000001,
"p99" : 0.017585,
"p99.9" : 0.049426100000000001
},
Hope someone is able to point me in the right direction.