We performed benchmarks based on FDB Record Layer (FDBRecordStore#scanRecords) to compare
with the FDB Java API.
We saw a factor of 3 time-wise in favour of the FDB Java API based on the shape
of our keys and values.
Context
We have some use-cases where we are only interested in the data contain in the
key but not in the value. Thus, we don’t want to spend time deserializing the
value into a Record.
The problem
Using scanRecords, it seems that there is no way to skip the deserialization as
the signature has to return a FDBStoredRecord:
In the current codebase, I don’t find any other method than FDBRecordStore#loadRawRecordAsync to return a FDBRawRecord.
My question: Is there anything in the API to return a cursor of FDBRawRecord-like skipping the deserialization?
Or in that case, do we have to fall back to the FDB Java API?
Our idea
If at the moment we have to fall back to the FDB Java API, do you think the FDB Record Layer team
would accept a PR to introduce a new method in FDBRecordStore with the following signature:
public RecordCursor<FDBRawRecord> scanRawRecords(@Nullable final Tuple low, @Nullable final Tuple high,
@Nonnull final EndpointType lowEndpoint, @Nonnull final EndpointType highEndpoint,
@Nullable byte[] continuation,
@Nonnull ScanProperties scanProperties)
In any case, thank you for the awesome work you did.
Seems like the right solution here would be to make a LazyFDBStoredRecord that deserializes on demand. Could be more generally useful. Also, still hoping for a scan that only returns keys over the wire unless that has been added since I last checked.
To be useful, the new LazyFDBStoredRecord would need to additionally have the getRawRecord method. Which means either it has a strange contract where that stops working as soon as you call getRecord or memory consumption is effectively doubled so it can keep the original around, just in case, and those bytes won’t immediately be GC’ed after conversion as now.