Questions about storage server on-disk format

I have a few questions about how storage server processes store data on-disk. We experienced a scenario where a development FDB cluster operating under a single machine single process configuration (ssd-2) became unavailable. (Not safe for a production configuration, I know…)

The root cause seems to have been that the storage server sqlite file grew to be greater than the available free space on the system. (E.g. sqlite file is 5GB, free space is 4.5GB) In this scenario, I assume that FDB isn’t able to continue because it needs to somehow make a copy of the whole storage server file, and it doesn’t have the free space to do so.

The questions I have about this are:

  1. Is my assumption that the storage server process was trying to copy/move its sqlite database file correct? If so, what FDB process is causing this behaviour? (or, why does the storage server need to do this?)

  2. I know a storage server stores data within an sqlite database (storage-xxxxxxx.sqlite) in the process’ data directory. Does a storage server always use one single sqlite file, regardless of how large it gets, or does a storage server split into multiple sqlite files when the file size gets too large? (If so, when does that split happen, and is it configurable?)

  3. Would running multiple storage servers (still on a single machine) help FDB better use all the available disk space on the system before failing? If so, how much would the sqlite files shrink when adding additional storage servers in an ssd-2 configuration?

    • E.g. if the sqlite file for 1 SS is 5GB, my intuition would be that:
      • 2 SSes would also each use 5GB of space (for a total of 10GB) (since data would be replicated twice)
      • but 3 SSes would use 3.3GB each (since data would only be on 2 SSes each)
      • and more SSes would use less
  1. Storage servers do not copy/move sqlite files. The database became unavailable probably because Ratekeeper throttling traffic when free disk space is less than MIN_AVAILABLE_SPACE=1e8 bytes or MIN_AVAILABLE_SPACE_RATIO=0.05 (i.e., 5%).
  2. A SS always use one single sqlite file by growing it (300+ GBs in some of our clusters).
  3. You want to spread SSes on different disks. You could put 2 or more SSes on one disk, but the limit specified above will apply.

BTW, because lazy deletion, SS needs vacuuming to reclaim space, which can be slow to regain the free space.

Re. your point 1. If the database was throttled by the ratekeeper, would we expect to see a failed write in the form of an io_error? This is the error message we encountered:

<Event
  Severity="10"
  Time="1733147627.698392"
  DateTime="2024-12-02T13:53:47Z"
  Type="AsyncFileKAIOAllocateError"
  ID="0000000000000000"
  Fd="27"
  Filename="/var/db/foundationdb/data/4500/storage-31ac87ee1a8d008b62929c9c3bf68818.sqlite"
  Size="5347737600"
  UnixErrorCode="1c"
  UnixError="No space left on device"
  ThreadID="14924332259347293433"
  Machine="127.0.0.1:4500"
  LogGroup="default"
  Roles="CD,SS,TL"
/>

This error is saying no disk space available, which is serious because the cluster can’t write anything. Even deleting data would need to temporarily write some meta data to disk.

The problem seems to be that throttling is not taking effect quick enough, which is a known issue, and the cluster can reach this state. For production clusters, a free space alert should be set up for this reason.