I’ve been are seeing a huge impact on Ratekeeper’s limits when the amount of write operations increases (~2x the amount of write transactions), thus affecting our write throughput and increasing latency in the system. We are keeping transaction sizes below 1MB (around 250 records per transaction) and key/value sizes below the Known Limitations suggestions.
As can be seen in the graph below, the number of available transactions/s goes down from ~50M ops/s to around 3M ops/s. We are querying fdb_cluster_qos_transactions_per_second_limit for available tx/s and fdb_cluster_qos_released_transactions_per_second for the release tx/s.
fdb_cluster_qos_transactions_per_second_limit and fdb_cluster_qos_released_transactions_per_second
The storage queue (fdb_cluster_processes_roles_storage_query_queue_max) also sees an impact with the increased amount of writes going from ~15 to 500+ (high spikes not a constant change) and we can see the amount of data growing as well going from around 12Mbps to 25Mbps (which is expected as we are nearly doubling the amount of data written)(fdb_cluster_processes_network_megabits_received_hz).
And tho the disk operations increase (fdb_cluster_processes_disk_writes_hz and fdb_cluster_processes_disk_reads_hz), the disks don’t seem to be anywhere near close to stressed as fdb_cluster_processes_disk_busy is always bellow 12%.
Seems ratekeeper was throttling, so you should start looking at RkUpdate trace events in its log, where each entry has something like ID="c78b41de2e8e417d" TPSLimit="45698.1" Reason="2". The reason is the one we are looking for, corresponds to values defined here.
Hi Steve, thank you for getting back on this, here’s the data you requested.
Which FDB version are you using?
FoundationDB 6.2 (v6.2.15)
source version 20566f2ff06a7e822b30e8cfd91090fbd863a393
protocol fdb00b062010001
Which storage engine?
onfiguration:
Redundancy mode - double
Storage engine - ssd-2
Coordinators - 6
Exclusions - 31 (type `exclude' for details)
Desired Proxies - 12
Desired Resolvers - 8
Desired Logs - 12
Cluster:
FoundationDB processes - 308 (less 2 excluded; 0 with errors)
Zones - 24
Machines - 24
Memory availability - 5.5 GB per process on machine with least available
Retransmissions rate - 1 Hz
Fault Tolerance - 1 machine
Server time - 02/01/24 17:33:58
Data:
Replication health - Healthy
Moving data - 0.000 GB
Sum of key-value sizes - 4.770 TB
Disk space used - 12.933 TB
Operating space:
Storage server - 1338.7 GB free on most full server
Log server - 826.1 GB free on most full server
Workload:
Read rate - 127406 Hz
Write rate - 6424 Hz
Transactions started - 100350 Hz
Transactions committed - 233 Hz
Conflict rate - 9 Hz
What is the key locality of your writes? Are they random/scattered or are they mostly sequential/adjacent?
Mostly random/scattered as they are UUIDs + app id combination
What kind of disks are you using?
NVME (instance store)
Are you placing >1 storage server on a disk volume?
Yes
Does each storage server have 1 physical CPU core to use?
Yes, we don’t see anywhere near full CPU/disk usage as shown in the original post.
Random writes are a worst-case IO pattern and the ssd-2 engine does not handle it well because it will incur many serial disk latencies on the write path, so you could be IO bound on the Storage Servers which causes their Storage Queue to build up and then Ratekeeper to limit. As @jzhou said you should look at the RkUpdate events to see what the limiting reason is.