scanRecords without deserialization

Hello everyone,

We performed benchmarks based on FDB Record Layer (FDBRecordStore#scanRecords) to compare
with the FDB Java API.
We saw a factor of 3 time-wise in favour of the FDB Java API based on the shape
of our keys and values.

Context

We have some use-cases where we are only interested in the data contain in the
key but not in the value. Thus, we don’t want to spend time deserializing the
value into a Record.

The problem

Using scanRecords, it seems that there is no way to skip the deserialization as
the signature has to return a FDBStoredRecord:

In the current codebase, I don’t find any other method than FDBRecordStore#loadRawRecordAsync to return a FDBRawRecord.

My question: Is there anything in the API to return a cursor of FDBRawRecord-like skipping the deserialization?
Or in that case, do we have to fall back to the FDB Java API?

Our idea

If at the moment we have to fall back to the FDB Java API, do you think the FDB Record Layer team
would accept a PR to introduce a new method in FDBRecordStore with the following signature:

public RecordCursor<FDBRawRecord> scanRawRecords(@Nullable final Tuple low, @Nullable final Tuple high,
                                                 @Nonnull final EndpointType lowEndpoint, @Nonnull final EndpointType highEndpoint,
                                                 @Nullable byte[] continuation,
                                                 @Nonnull ScanProperties scanProperties)

In any case, thank you for the awesome work you did.

I believe that you are right that there is nothing to do what you want today.

I cannot think of any objection to an issue or pull-request for adding a new scanRawRecords.

1 Like

Seems like the right solution here would be to make a LazyFDBStoredRecord that deserializes on demand. Could be more generally useful. Also, still hoping for a scan that only returns keys over the wire unless that has been added since I last checked.

To be useful, the new LazyFDBStoredRecord would need to additionally have the getRawRecord method. Which means either it has a strange contract where that stops working as soon as you call getRecord or memory consumption is effectively doubled so it can keep the original around, just in case, and those bytes won’t immediately be GC’ed after conversion as now.

The C API does not have just key scanning, no.