(I previously asked about key expiry here – this is a different, but related, use case.)
I’m considering FDB for a use case with the following properties:
- Simple key-value storage
- Keys are 8-byte hashes
- Values are large (mean 5k) binary blobs
- Each key/value pair expires after T time has passed (perhaps 24h)
- Keys are never deleted apart from expiration
- Writes to existing keys must be rejected
- The rate of writes is fairly high (10k/sec)
- The rate of reads is low (500/sec)
- Most reads are either for nonexistent keys or for recently written keys
Option A would be to store the values with the expiry keys:
(timestamp, key) -> value
(key) -> timestamp
Option B is to store the values with the primary keys:
(timestamp, key) -> ""
(key) -> value
I was leaning toward A because then the data that expires at the same time should be sequential, so I would think that writes and deletes would be more efficient. But on the other hand, does this concentrate lots of writes on the same FDB process?
With B reads only have to consult the primary key, not the expiry key, but given the high write:read ratio I don’t care about optimizing for reads. (Note that writes have to consult the PK in either A or B because of the need to detect and reject duplicate keys.)
Should I go with A or B or something else entirely?
Is FoundationDB a good fit for this use case? I think the common wisdom is that B-tree based DBs are best for read-heavy workloads and you want an LSM for write-heavy workloads, but it seems that FDB should be okay for this given enough machines with fast SSDs.