I wanted to understand the internals more in depth - specifically around storage. My guess is that scans are fast, due to how underlying storage is designed. Also, I wanted to understand the source of the 100kb limit of value size.
Any pointers to existing docs would also be really helpful.
There should be a simple documented limit rather than a slow decent into pain as value get larger
The limit should be small enough that reading a single KV doesn’t represent a noticeable latency blip for the entire server
The limit should be high enough to not be annoying to developers
The limit should be high enough that the cost of reading the bytes associated with the value is significantly more expensive than the per-request overhead (this is to enable a low abstraction penalty for applications storing large data across many key-value pairs)
Given FoundationDB’s use of SSDs, 100K was chosen years ago as fitting those constraints.
A layer can easily build a “large-value” abstraction that supports seeking and streaming of a large values by storing, say, 64K at a time in multiple keys.
An interesting improvement to FDB would be to have it’s native API support streaming large values, but it would probably be a breaking API change.
Dave did a good job explaining the basic consideration behind our value size limit.
Although it does not go into too much detail about low level storage, I added some more information about our architecture in a different thread that might be interesting.
I’ll also point you at the “blob” design recipe. It shows a very simple way to use more than one key value pair to store a larger value. It does not, however, get a developer past the total transaction size limits. To accomplish this, and do it in a safe manner, one would have to: