When values are an order of magnitude or more larger than keys, it’s worth pointing out that while a “getRange but only keys” API would save network bandwidth and associated CPU overhead, at the storage engine level the full KV pairs represented in the result set are still visited(*) and therefore are read from disk into cache if not cached.
In your scenario @jkominek your storage servers would still end up reading the ~8.1kB KV pairs from disk just to get the keys. So your keys-only query byte yield would be something like 1-2% of the disk read bytes (caching aside).
If you want to frequently scan long sequences of keys-only and not subsequently read most of their values (which would be cached after the scan as a byproduct) you may want to consider a secondary index of sorts where essentially the keys are stored twice but under difference subspaces and only one subspace contains values. Transactions would of course have to modify both subspaces when there are key changes. The storage cost would be only another 1-2% but you would be able to scan either Keys or Keys+Values with high query byte yield per disk byte yield.
Note that this is a slightly different proposal from what is normally meant by “secondary index” which probably looks more like (key → small_unique_id) and (small_unique_id → full_value) as with this model Keys+Values range scans would have to be a key range scan + one point lookup per key to get its value.
(*) This is currently true for all disk-backed storage engines in FDB. Storage engines do exist which separate keys and values, which could facilitate faster key-only range scans but at a cost of one additional lookup per value when values are needed.