The Known Limitations mention value sizes up to 100KB at the time of writing. Is there anything on the roadmap for handling larger value sizes?
Let’s assume for a minute FoundationDB could handle values in the [5-10]MB range. This could eventually lead to FoundationDB be a valid solution for an on-premise S3 replacement (cfr multi-part upload). Even better, I could join S3 and DynamoDB in a single storage solution using different “layers” to store data and metadata in a single transaction.
Also, handling larger chunks would be a perfect fit for this HTTP extension: tus.io
I believe Dave talked about this in an earlier post and should probably be in the FAQ since the question comes up so often.
It’s super easy to store huge values split over multiple keys, that’s not a problem. You just create a virtual table of keys where each successive key stores the next block. Then you can use a range query to get the entire blob back out in a streaming fashion
The issue is doing it in a way that works generically and makes sense as a layer that works across different user applications, since your key space might be mixed up with all kinds of other information.
That said, if that were your primary use case for FDB this would not be a problem.
This is also closely related to the FoundationDB as a Document Store question that also comes up a lot. It’s pretty much the same question.
In any case, yes you can do it, with very little effort and don’t need to wait until there is an official layer that provides this functionality.
I understand that I can do it now and it was clear to me that I need to manage the list of the chunk keys. I am currently not after a generic way to handle this.
But with max value (chunk) size of 100kb, it’s not an easy map from http client uploads which offer chunks of e.g. 5MB. I still would have to chop that up even further to chunks of 100kB and vice-versa on retrieval. I would rather have the 1-1 mapping in place.
Sure, I understand the desire for a single API that you just give it a byte array of whatever size and it simply stores or retrieves it, but you can do that.
It’s really not hard and if you need help please don’t hesitate to PM me if you need help, though I’m certain if you are willing to take on FDB then you can do it.
The mapping is absolutely no worse than reading or writing a block at a time from disk.
Splitting a value into smaller chunk is “easy” enough, but the less visible gotcha is that you also have a 10MB per-transaction limit!
So if you want to store a “file” larger than 10MB (even chunked into 100KB keys), it will not be possible to do it in a single transaction, you need two or more!
But now:
a concurrent reader could observe a partially uploaded file, so you need additional signalling with a “status” key, and so on,
what happens if the upload crash mid-flight, or if the web worker process dies? You need to do garbage collection of fragmented or incomplete documents at some point. But which process will do it? how often?
if you want to support resuming by the user, you will need some sort of index to map the “random” document ID into a specific file or user session.