Variable chunk size for blobs


(Jon Anderson) #1

Here is the page describing storage of blobs in Java. You can see the two lines for determining the chunk size below:

int numChunks = (data.length() + CHUNK_SIZE - 1)/CHUNK_SIZE;
int chunkSize = (data.length() + numChunks)/numChunks;

What is the purpose of the chunkSize variable. Why not simple use the CHUNK_SIZE constant?


(Christophe Chevalier) #2

I think it’s trying to equalize the size of all chunks. Using CHUNK_SIZE, the last chunk may be smaller than the other ones while here they all have the same size… though I’m not sure if this is a real concern: I don’t see any perf reason to go one way or the other, but equalizing the size of all chunks makes it a bit difficult to support append semantics on your blobs.

Another thing: the code referenced use the Tuple encoding to encode the value … which is a bit wasteful. I would have simply truncated the bytes obtained from the UTF-8 representation of the string into smaller chunks. This is even simpler when storing plain bytes.


(Christophe Chevalier) #3

If you are looking for a “full featured” blob layer implementation, there was an original Blob Layer sample written in python, but it is probably not available anymore.

I did port it to C# a long time ago, it’s still available here: https://github.com/Doxense/foundationdb-dotnet-client/blob/master/FoundationDB.Layers.Common/Blobs/FdbBlob.cs

Disclaimer: In practice, I never used this implementation because I don’t really need to support sparse files or shrink/truncate blobs and the representation of the keys is a bit weird. The implementation we use is a lot more straight-forward, and adds support for compression, deduplication and attribute indexing.