Should I optimize for single reads?


(Jon Anderson) #1

So I have a blob of data. This blob of data has some metadata associated with it. Both are static and won’t be changing after insertion. Originally, I was planning on creating a key like this:

(category 1, category 2, timestamp, ID, version, timestamp ) = blob

Does this key seem too large to anyone? It’s designed this way because I’ll potentially be doing range reads on each prefix in this key (except for possibly the last timestamp). Another pro is that a single entry gives me the blob and all it’s metadata (instead of reading from multiple entries to get all this info).

Am I using the abstractions properly here? Should I be using directories for any of these prefixes instead of simple keys?


(Steve Atherton) #2

This sounds perfectly fine for what you are doing. Note that blob cannot be more than 100k. The main reason to split your unchanging blobs into multiple key-value pairs is if you want to be able to read some of the split entries but not the entire blob value. The current cost of splitting is that the repeated keys are stored in full on disk because the ssd storage engine does not compress keys. The upcoming Redwood storage engine will have key prefix compression, so such splitting will become cheaper.