Configure fdb to write to disk only once every 10 mins

tkxue · February 2, 2026, 7:32am

I am trying to setup a fdb cluster in such a way it can handle 1M small writes / second.

Is it possible to configure the cluster in such a way so that:

writes, by default, go only into memory, and not into SSD
every 10 mins or so, all transactions get flushed out to SSD

So I’m okay with a world where a crash results in losing 10 mins worth of transactions as long as each transaction is all-or-nothing. I am willing to make this tradeoff in exchange for more performance.

janderland · February 2, 2026, 7:38pm

There is a “memory” mode where FDB persists all writes to the on-disk WAL, but the actual DB is stored in memory. This may be faster for writes, though I’m not sure.

tkxue · February 3, 2026, 12:27am

In theory (though not sure if inpractice possible with fdb), I think we can go a step further.

Within a 10 minute block, only do wries to memory, don’t touch on-disk WAL. Every 10 minutes, compact all the transactions/diffs within the 10 minute block, and write them out to the on-disk WAL. (So if a particular hot key is written to 100 times in the 10 minute block, we only write the last entry).

Is there some way to do to this in fdb? Where I’m explicitly saying: it’s okay to lose the last 10 minutes of writes, as long as each transactions are all-or-nothing.

Imperatorx · February 3, 2026, 9:35am

You might be able to achieve this on the filesystem level: how about setting up the log and storage process data directory on a ZFS partition with sync=disabled and configure the flush timeout to 10 minutes from the default 5 seconds in ZFS? This will be completely transparent to fdb. When fdb calls fsync it will return immediately, but in reality it will only be in memory. Not sure how it will behave after a crash though.

markus.pilman · February 3, 2026, 9:45am

Short answer: I am almost certain you can’t do that without code changes.

Longer answer:

there’s two systems involved in the write path: the transaction log and the storage servers. You have to look at them separately. The storage servers only write to disk every ~500ms IIRC. This is beneficial, since writing to a B-tree is usually faster if you do it in batches and the MVCC window is in memory anyways. You probably could set this interval to 10 minutes and it would just work (assuming you have enough memory). Though it will increase pressure on the transaction log which will result in longer recovery times (but it sounds you’re fine with that).

The bigger problem is the transaction log: FDB will simply refuse to acknowledge a commit before it has been written to all transaction logs and a fsync returned successfully. Doing so would violate the ACID guarantees and such a feature simply doesn’t exist.

I am not sure whether this will work, but keep in mind that we use O_DIRECT – not sure how the file system implementation will exactly handle this situation.

This being said: btrfs and zfs probably don’t perform well for FDB (or any B-tree). So my guess is that you’ll get worse performance even if it works (I am not criticizing these file systems, they’re amazing and I am a big fan, but they’re not built for B-trees – for a storage engine you want something simple and ZFS is basically its own storage engine).

Topic		Replies	Views
Some Clarification on Storage Engine and Disk/IO Using FoundationDB	12	2576	July 23, 2019
Scalable time series database / Improving write throughput Using FoundationDB	8	1935	March 30, 2020
Maximizing horizontal fdb "double CAS" write throughput, okay with losing last 10 mins of transactions Using FoundationDB	1	180	February 5, 2026
Relax consistency guarantees Using FoundationDB	17	2573	October 30, 2019
FoundationDB cluster performance issue - Periods of high disk I/O and sustained high latency Using FoundationDB performance	21	2872	July 6, 2020

Configure fdb to write to disk only once every 10 mins

Related topics