Ratekeeper limits dropping substantially with bulk writes

Hi all!

I’ve been are seeing a huge impact on Ratekeeper’s limits when the amount of write operations increases (~2x the amount of write transactions), thus affecting our write throughput and increasing latency in the system. We are keeping transaction sizes below 1MB (around 250 records per transaction) and key/value sizes below the Known Limitations suggestions.

As can be seen in the graph below, the number of available transactions/s goes down from ~50M ops/s to around 3M ops/s. We are querying fdb_cluster_qos_transactions_per_second_limit for available tx/s and fdb_cluster_qos_released_transactions_per_second for the release tx/s.


fdb_cluster_qos_transactions_per_second_limit and fdb_cluster_qos_released_transactions_per_second

The storage queue (fdb_cluster_processes_roles_storage_query_queue_max) also sees an impact with the increased amount of writes going from ~15 to 500+ (high spikes not a constant change) and we can see the amount of data growing as well going from around 12Mbps to 25Mbps (which is expected as we are nearly doubling the amount of data written)(fdb_cluster_processes_network_megabits_received_hz).


fdb_cluster_processes_roles_storage_query_queue_max


fdb_cluster_processes_network_megabits_received_hz

And tho the disk operations increase (fdb_cluster_processes_disk_writes_hz and fdb_cluster_processes_disk_reads_hz), the disks don’t seem to be anywhere near close to stressed as fdb_cluster_processes_disk_busy is always bellow 12%.


fdb_cluster_processes_disk_busy


fdb_cluster_processes_disk_writes_hz


fdb_cluster_processes_disk_reads_hz

The RPS on the client app decrease and latency increases as soon as we start writing more data and the ratekeeper limits kick in.

Any pointers on how to configure/increase ratekeeper limits, and/or increase write throughput is greatly appreciated!

Please feel free to ask for whatever information I might have missed and could be useful to addressing this issue.

Best regards!

Seems ratekeeper was throttling, so you should start looking at RkUpdate trace events in its log, where each entry has something like ID="c78b41de2e8e417d" TPSLimit="45698.1" Reason="2". The reason is the one we are looking for, corresponds to values defined here.

1 Like

Questions

  • Which FDB version are you using?
  • Which storage engine?
  • What is the key locality of your writes? Are they random/scattered or are they mostly sequential/adjacent?
  • What kind of disks are you using?
  • Are you placing >1 storage server on a disk volume?
  • Does each storage server have 1 physical CPU core to use?
1 Like

Hi Steve, thank you for getting back on this, here’s the data you requested.

  • Which FDB version are you using?
FoundationDB 6.2 (v6.2.15)
source version 20566f2ff06a7e822b30e8cfd91090fbd863a393
protocol fdb00b062010001
  • Which storage engine?
onfiguration:
  Redundancy mode        - double
  Storage engine         - ssd-2
  Coordinators           - 6
  Exclusions             - 31 (type `exclude' for details)
  Desired Proxies        - 12
  Desired Resolvers      - 8
  Desired Logs           - 12

Cluster:
  FoundationDB processes - 308 (less 2 excluded; 0 with errors)
  Zones                  - 24
  Machines               - 24
  Memory availability    - 5.5 GB per process on machine with least available
  Retransmissions rate   - 1 Hz
  Fault Tolerance        - 1 machine
  Server time            - 02/01/24 17:33:58

Data:
  Replication health     - Healthy
  Moving data            - 0.000 GB
  Sum of key-value sizes - 4.770 TB
  Disk space used        - 12.933 TB

Operating space:
  Storage server         - 1338.7 GB free on most full server
  Log server             - 826.1 GB free on most full server

Workload:
  Read rate              - 127406 Hz
  Write rate             - 6424 Hz
  Transactions started   - 100350 Hz
  Transactions committed - 233 Hz
  Conflict rate          - 9 Hz
  • What is the key locality of your writes? Are they random/scattered or are they mostly sequential/adjacent?
    Mostly random/scattered as they are UUIDs + app id combination

  • What kind of disks are you using?
    NVME (instance store)

  • Are you placing >1 storage server on a disk volume?
    Yes

  • Does each storage server have 1 physical CPU core to use?
    Yes, we don’t see anywhere near full CPU/disk usage as shown in the original post.

Random writes are a worst-case IO pattern and the ssd-2 engine does not handle it well because it will incur many serial disk latencies on the write path, so you could be IO bound on the Storage Servers which causes their Storage Queue to build up and then Ratekeeper to limit. As @jzhou said you should look at the RkUpdate events to see what the limiting reason is.