Is there a way to iterate through keys without reading the values? Seems like this would save a lot of network traffic if the values are not relevant. (I’m using the C API, so I’m looking for some form of fdb_transaction_get_range that only returns the keys.)
I was wondering what would be the use-case for such an API?
From what I understand, the place where having keys without values would be useful would be in secondary indexes, in which case the value is not set. So, we would not see additional network traffic.
Anytime we are reading a key, it seems to me we would be interested in its value or it would be a key that contains an empty value (for example indexes).
I’ve got a design where I always pull certain keys with a range access, and the values are significantly larger than the key (~128 bit keys, and 8kB values). I store some metadata about how the value is compressed at the end of the key, (“no compression”, “zstd level 1”, “zstd level 10”, whatever)
If I could do key-only range scans, I could more efficiently grab huge swaths of those keys, to inspect that metadata.
I feel like a number of additional use cases could be concocted given some time and effort.
My immediate use case is diagnostic tooling, but I can think of others – for example, denormalize one attribute out of the value into the key (not as a prefix), then you can do top-N computations on this attribute by reading just the keys, not the values. Of course you can do the same with a secondary index, but maybe in rare cases you’d rather spend the extra CPU instead of storing 2x the number of keys.
When values are an order of magnitude or more larger than keys, it’s worth pointing out that while a “getRange but only keys” API would save network bandwidth and associated CPU overhead, at the storage engine level the full KV pairs represented in the result set are still visited(*) and therefore are read from disk into cache if not cached.
In your scenario @jkominek your storage servers would still end up reading the ~8.1kB KV pairs from disk just to get the keys. So your keys-only query byte yield would be something like 1-2% of the disk read bytes (caching aside).
If you want to frequently scan long sequences of keys-only and not subsequently read most of their values (which would be cached after the scan as a byproduct) you may want to consider a secondary index of sorts where essentially the keys are stored twice but under difference subspaces and only one subspace contains values. Transactions would of course have to modify both subspaces when there are key changes. The storage cost would be only another 1-2% but you would be able to scan either Keys or Keys+Values with high query byte yield per disk byte yield.
Note that this is a slightly different proposal from what is normally meant by “secondary index” which probably looks more like (key → small_unique_id) and (small_unique_id → full_value) as with this model Keys+Values range scans would have to be a key range scan + one point lookup per key to get its value.
(*) This is currently true for all disk-backed storage engines in FDB. Storage engines do exist which separate keys and values, which could facilitate faster key-only range scans but at a cost of one additional lookup per value when values are needed.
I suspected as much, which is why I hadn’t even asked about such functionality, myself. I’d use a call if it was put in, but wouldn’t expect it to improve much of anything for that particular use case.
Hi, I couldn’t find any info on Index-only scans, do you have any pointers for that? Or are you suggesting manually maintaining a separate record type just for “indexing”? Thanks!
For future reference I found a variant of .scanIndex that although is deprecated, it appear to allow scanning an index without retrieving the values, the signature is (.scanIndex ctx index scanType range continuation scanProperties recordScanLimiter)
I’m suggesting the latter. Since the Key and Value for a given record are physically stored together, it could be beneficial to have a keys-only variation of your records under a different prefix so when you scan the keys-only space you aren’t reading the values from disk into cache in addition to not returning them.