Ratekeeper - MAX_TL_SS_VERSION_DIFFERENCE disabled?

In server knobs, MAX_TL_SS_VERSION_DIFFERENCE is effectively disabled with a value of 1e99 (foundationdb/fdbclient/ServerKnobs.cpp at main · apple/foundationdb · GitHub)

Doesn’t this prevent Ratekeeper from protecting the cluster when storage servers are unable to keep up with writes?

Is there a reason to not set it to something like 10e6?

The Ratekeeper now uses the storage server and TLog queue sizes. The SS queue represents how much the storage server has to apply and the TLog queue represents how much is kept in memory and was not consumed by all SSes.

The following knobs set the “soft” limit on how much can be queued:

  • TARGET_BYTES_PER_STORAGE_SERVER (1GB)
  • TARGET_BYTES_PER_STORAGE_SERVER_BATCH (750MB)
  • TARGET_BYTES_PER_TLOG (2.4GB)
  • TARGET_BYTES_PER_TLOG_BATCH (1.4GB)

The batch limits are separate and will start throttling batch-priority transactions earlier.

The ratekeeper will not start slowing down clients until the “spring” threshold is exceeded, which is how much of the soft budget is left:

  • SPRING_BYTES_STORAGE_SERVER (100MB)
  • SPRING_BYTES_STORAGE_SERVER_BATCH (100MB)
  • SPRING_BYTES_TLOG (400MB)
  • SPRING_BYTES_TLOG_BATCH (300MB)

There are also some other knobs:

  • STORAGE_HARD_LIMIT_BYTES: The hard limit a storage server can queue (1.5GB). If this limit is exceeded, the SS will stop reading from the TLog until sufficient progress is made.
  • TLOG_HARD_LIMIT_BYTES: The hard limit a TLog can keep in memory. This only kicks in if the spill process is not fast enough, and will block all new queue operations until the spill process is under the limit. (3GB)
  • TLOG_SPILL_THRESHOLD: The TLog will start spilling logs to disk if there is more than this much queued. (1.5GB)

The reason TLog spilling is needed is that if an SS fails, while it is being replaced by the Data Distributor, any mutations that are for the failed server are queued in case it comes back online (say, a restart).
In this case, the amount a TLog has to keep would exceed the short 5s window, and may not fit in memory.

(The on-disk circular log is only used for crash recovery.)

@Semisol isn’t TARGET_BYTES_PER_STORAGE_SERVER actually 1GB? 1000e6 bytes?

1 Like

That was an error, fixed :slight_smile:

1 Like