Questions about storage server on-disk format

I have a few questions about how storage server processes store data on-disk. We experienced a scenario where a development FDB cluster operating under a single machine single process configuration (ssd-2) became unavailable. (Not safe for a production configuration, I know…)

The root cause seems to have been that the storage server sqlite file grew to be greater than the available free space on the system. (E.g. sqlite file is 5GB, free space is 4.5GB) In this scenario, I assume that FDB isn’t able to continue because it needs to somehow make a copy of the whole storage server file, and it doesn’t have the free space to do so.

The questions I have about this are:

  1. Is my assumption that the storage server process was trying to copy/move its sqlite database file correct? If so, what FDB process is causing this behaviour? (or, why does the storage server need to do this?)

  2. I know a storage server stores data within an sqlite database (storage-xxxxxxx.sqlite) in the process’ data directory. Does a storage server always use one single sqlite file, regardless of how large it gets, or does a storage server split into multiple sqlite files when the file size gets too large? (If so, when does that split happen, and is it configurable?)

  3. Would running multiple storage servers (still on a single machine) help FDB better use all the available disk space on the system before failing? If so, how much would the sqlite files shrink when adding additional storage servers in an ssd-2 configuration?

    • E.g. if the sqlite file for 1 SS is 5GB, my intuition would be that:
      • 2 SSes would also each use 5GB of space (for a total of 10GB) (since data would be replicated twice)
      • but 3 SSes would use 3.3GB each (since data would only be on 2 SSes each)
      • and more SSes would use less
  1. Storage servers do not copy/move sqlite files. The database became unavailable probably because Ratekeeper throttling traffic when free disk space is less than MIN_AVAILABLE_SPACE=1e8 bytes or MIN_AVAILABLE_SPACE_RATIO=0.05 (i.e., 5%).
  2. A SS always use one single sqlite file by growing it (300+ GBs in some of our clusters).
  3. You want to spread SSes on different disks. You could put 2 or more SSes on one disk, but the limit specified above will apply.

BTW, because lazy deletion, SS needs vacuuming to reclaim space, which can be slow to regain the free space.

1 Like

Re. your point 1. If the database was throttled by the ratekeeper, would we expect to see a failed write in the form of an io_error? This is the error message we encountered:

<Event
  Severity="10"
  Time="1733147627.698392"
  DateTime="2024-12-02T13:53:47Z"
  Type="AsyncFileKAIOAllocateError"
  ID="0000000000000000"
  Fd="27"
  Filename="/var/db/foundationdb/data/4500/storage-31ac87ee1a8d008b62929c9c3bf68818.sqlite"
  Size="5347737600"
  UnixErrorCode="1c"
  UnixError="No space left on device"
  ThreadID="14924332259347293433"
  Machine="127.0.0.1:4500"
  LogGroup="default"
  Roles="CD,SS,TL"
/>

This error is saying no disk space available, which is serious because the cluster can’t write anything. Even deleting data would need to temporarily write some meta data to disk.

The problem seems to be that throttling is not taking effect quick enough, which is a known issue, and the cluster can reach this state. For production clusters, a free space alert should be set up for this reason.

1 Like

@jzhou it looks like this got lost in some other problems I was researching last December, but I just encountered this again. Same situation, single-node FDB instance, the data directories are stored on a 10GB volume, but as soon as the SS sqlite file got to 5GB, the node ground to a halt.

What’s the cause for this? There’s still approximately 4.5GB of space available on that volume, so why am I getting that No space left on device error? We’re running 7.2.

What is the exact trace log entry from this time, df, df -i, and a listing of all files on the data directory (with size)?

Hi @Semisol,

Here’s an example of the log entry we were encountering. This was preventing the (single-node) server from starting:

{
  "description": "StorageServerFailed: io_error at Mon Aug 11 17:07:45 2025",
  "name": "io_error",
  "raw_log_message": "\"Severity\"=\"40\", \"ErrorKind\"=\"DiskIssue\", \"Time\"=\"1754932065.768774\", \"DateTime\"=\"2025-08-11T17:07:45Z\", \"Type\"=\"StorageServerFailed\", \"ID\"=\"c20566c16c046b54\", \"Error\"=\"io_error\", \"ErrorDescription\"=\"Disk i/o operation failed\", \"ErrorCode\"=\"1510\", \"Reason\"=\"Error\", \"ThreadID\"=\"5459356763039781762\", \"Backtrace\"=\"addr2line -e fdbserver.debug -p -C -f -i 0x43f9e3c 0x43f8a80 0x43f8e6e 0x2730691 0x2730dec 0x2730fa5 0x26fd38b 0x26fd612 0xd25649 0x27208c6 0x2720b95 0x12a12a0 0x12a1562 0xd25649 0x2710ec4 0x2710fbc 0x12a12a0 0x12a1562 0xd25649 0x17487af 0x15a3040 0x438bb32 0xc82397 0x7fa63d8ab083\", \"Machine\"=\"127.0.0.1:4500\", \"LogGroup\"=\"default\", \"Roles\"=\"SS,TL\"",
  "time": 1754930000,
  "type": "StorageServerFailed"
}

I did run df -i during the broken state, and confirmed we weren’t out of inodes:

$ df -i
Filesystem             Inodes  IUsed   IFree IUse% Mounted on
/dev/mapper/data-data 5240832     68 5240764    1% /var/db

I don’t have an exact capture of df during the broken state, but here it is after I increased the size of the data volume:

$ df -h
Filesystem             Size  Used Avail Use% Mounted on
/dev/mapper/data-data   15G  5.4G  9.7G  36% /var/db

Before I increased the size of the data volume, it was 10GB. I believe the used was hovering at around 4.9GB or 5.0GB. 50% of the volume was available for use. I also don’t have an ls output from exactly when the issue was occuring, but here’s the data volume now.

$ ls -lah
total 5.2G
drwxr-xr-x 2 foundationdb foundationdb 4.0K Aug 11 18:22 .
drwxr-xr-x 3 foundationdb foundationdb   18 Feb  2  2024 ..
-rw---S--- 1 foundationdb foundationdb  28K Aug 11 18:58 coordination-0.fdq
-rw---S--- 1 foundationdb foundationdb  16K Aug 11 18:58 coordination-1.fdq
-rw---S--- 1 foundationdb foundationdb 172K Aug 11 18:58 globalconf-0.fdq
-rw---S--- 1 foundationdb foundationdb  16K Feb  2  2024 globalconf-1.fdq
-rw---S--- 1 foundationdb foundationdb 4.0K Feb  2  2024 localconf-0.fdq
-rw---S--- 1 foundationdb foundationdb 4.0K Feb  2  2024 localconf-1.fdq
-rw---S--- 1 foundationdb foundationdb 100M Aug 14 15:07 log2-V_6-e49be5f9f4233f26cfb243fe0802b8b8.sqlite
-rw---S--- 1 foundationdb foundationdb  76K Aug 14 15:07 log2-V_6-e49be5f9f4233f26cfb243fe0802b8b8.sqlite-wal
-rw---S--- 1 foundationdb foundationdb  31M Aug 14 11:35 logqueue-V_6-e49be5f9f4233f26cfb243fe0802b8b8-0.fdq
-rw---S--- 1 foundationdb foundationdb  56M Aug 14 15:07 logqueue-V_6-e49be5f9f4233f26cfb243fe0802b8b8-1.fdq
-rw---S--- 1 foundationdb foundationdb 4.0K Feb  2  2024 processId
-rw---S--- 1 foundationdb foundationdb 5.0G Aug 14 15:07 storage-c20566c16c046b548baadbf8fa0207fb.sqlite
-rw---S--- 1 foundationdb foundationdb  14M Aug 14 15:07 storage-c20566c16c046b548baadbf8fa0207fb.sqlite-wal
-rw------- 1 foundationdb foundationdb 4.0K Feb  2  2024 sw-version

I do remember the storage-c20566c16c046b548baadbf8fa0207fb.sqlite file was around 4.9GB or 5.0GB when I checked during the broken state. We were never close to fully exhausting the data volume.

I assume it is AsyncFileKAIOAllocateError since it is not included. It should not have to grow the file by multiple GBs.

I suspect it may be some sort of WAL-compaction operation or some sort of housekeeping/wiggle, I assume the filename mentioned in the error message that specifies the “No space left on device” on the same node is different to the currently “active” SQLite file.

What type of housekeeping/wiggle operation would need to write multiple GBs? Also, keep in mind this is a single-machine single-process cluster.