Considerations for key and value sizes

(gaurav) #1


I want to better understand how do key and value sizes determine the performance of various read/write operations. Could some high-level description of how key and values are laid out in the storage system be helpful?

Maintaining efficient key and value sizes is essential to getting maximum performance with FoundationDB. As a rule, smaller key sizes are better, as they can be more efficiently transmitted over the network and compared. In concrete terms, the highest performance applications will keep key sizes below 32 bytes. Key sizes above 10 kB are not allowed, and sizes above 1 kB should be avoided—store the data in the value if possible.

Value sizes are more flexible, with 0-10 kB normal. Value sizes cannot exceed 100 kB. Like any similar system, FoundationDB has a “characteristic value size” where the fixed costs of the random read roughly equal the marginal costs of the actual bytes. For FoundationDB running on SSD hardware, this characteristic size is roughly 1 kB for randomly accessed data and roughly 100 bytes for frequently accessed (cached) data.

(Could someone please explain in simpler terms, the reference to ‘characteristic value size’, mentioned above?)

For instance, if I need to store rows where keys are 64Bytes, and values of roughly 10KB, there are two ways I could think of storing these:

key -> value

keys/key -> pointer
data/pointer' -> value

I would need to do a mixture of key lookups (using KeySelectors as I do not know the exact key) and range queries over ranges of size 10-100. There would be 100’s of reads for every write operation.

In option1, there are fewer rows touched, (read and write time) whereas, in option 2, the index rows are much smaller, which would hypothetically help in faster lookups and fewer disk page reads at the read-time.

How do I go about reasoning such questions when designing the data model?

(gaurav) #2

Just wanted to see if anyone had suggestions on this; or is this totally determined by experimentation.

(Alex Miller) #3

I think it’s trying to explain that in handling in RPC, there’s a fixed cost, and a variable cost that scales with request size. It’s making the claim that the fixed cost and the variable cost are the same (so 2x fixed in total) when the value size is 1KB for disk reads and 100B for memory reads.

However, given varying network latencies and that hardware has changed since a lot of the performance tests in the docs were done, it’s not clear to me how helpful this guidance is.

With back-of-the-envelope approximation math. It’s mostly just looking at what accesses you’ll be doing to your data, how frequently they happen with respect to each other, and playing with the tradeoffs so that it’ll be as cheap as possible overall. The calculation you’d actually want to do is

option1 = average key+value size * average number of keys / (MB/s readable by one range read)
option2 = average key size * average number of keys / (MB/s readable by one range read) + 1 database point read latency

And go with whichever is smaller.

Given the drastic difference between your key and value sizes, I might actually imagine that option2 would be faster for you, assuming your clients are close to your servers.