Guide on using foundationdb w/ blob store?

I realize small objects can be stored in fdb via breaking it up into 100kb segments.

This, however, doesn’t work very well for GenAI training sets / models.

Is there a guide on using fdb w/ s3/minio style external object stores ?

I don’t know of a guide, but is there a particular obstacle in your way? Every object store has a way to uniquely identify a chunk of data. You could store the metadata & unique ID for the chunk in FDB. Ask yourself ahead of time how you would like to query the metadata and make sure to create indexes (in FDB) based on your query approach.

Sorry for not being clear. Let me try to rephrase question:

  1. iirc fdb limits are 10kb keys, 100kb values
  2. 10GB file, fairly common in ML, would be 100_000 100kb chunks
  3. splitting said 10GB file intp 100_000 100kb chunks would be silly, as we are unlikely to ever update a 100kb chunk on its own
  4. iirc snowflake & deepseek both have their own techniques for (1) meta data in fdb, (2) actual data in s3 / blob store
  5. I’m wondering if there is any published idiomatic approach mirroring snowflake/deepseek’s approach, to just learn from their strategy or possibly reinventing through their mistakes

I haven’t researched how those companies organize their KVs, so I don’t know.

I’m open to s3/minio-ish + fdb resources not from snowflake/deepseek too.

Just as a simple example: we don’t want to do writes to s3/min-io inside a fdb transaction (5 second limit, can retry); so the s3/min-io writes must happen before or after the fdb transaction.

But then this sounds race-y. One possible way to get around this is to use keyname = sha512(content) for s3/min-io, and store data in fdb. But there’s questions around this, as well as versioning, …

Anyway, if anyone comes across some guide on “storing s3/min-io meta data” in fdb, please ping me.

Thanks!

I can talk about how we do it at my job…

We use a 2-phase commit to avoid the problems you talking about, but you can often choose the appropriate order so that the dangling writes don’t negatively affect the system, even without 2-phase.

For instance, choose which data source will be the “entrypoint” for the data. Usually, this is the index (FDB). By entrypoint, I mean which data source will be read first when you’re combining data from two source. This entrypoint is the most important one for data consistency. If you have a binary blob which lacks an index in FDB, nothing will ever read it and it won’t have any effect on the system (besides wasting space). The inverse is not true: if you have a index lacking a blob then the system will see the index and then try to read the blob and fail.

For the 2-phase commit, you can have a list in FDB for incomplete operations. Entries in this list should have enough info to find the blob. You create this entry first with a timestamp attached. Then you do the blob write. Then you do the metadata write & delete the incomplete operation entry in 1 transaction. If the incomplete entry becomes too old then you have a background service delete the blob, as we’re assuming the processing performing the write died halfway through.

so your core invariant is:
if it exists in fdb, it must exist in s3?

so:

insert = insert s3; insert fdb
delete = delete fdb; delete s3
update = delete fdb; overwrite s3; insert fdb

?

Correct. Though with the write and update operation you would write the “incomplete entry” row to FDB before the blob write so a GC can clean up dangling blobs later.

1 Like