scanRecords without deserialization

arnaudgeiser · July 16, 2021, 8:17am

Hello everyone,

We performed benchmarks based on FDB Record Layer (FDBRecordStore#scanRecords) to compare
with the FDB Java API.
We saw a factor of 3 time-wise in favour of the FDB Java API based on the shape
of our keys and values.

Context

We have some use-cases where we are only interested in the data contain in the
key but not in the value. Thus, we don’t want to spend time deserializing the
value into a Record.

The problem

Using scanRecords, it seems that there is no way to skip the deserialization as
the signature has to return a FDBStoredRecord:

github.com

FoundationDB/fdb-record-layer/blob/59f82818f019e78d7639b9e6124c1aeb87082929/fdb-record-layer-core/src/main/java/com/apple/foundationdb/record/provider/foundationdb/FDBRecordStore.java#L1099

    
      
              }
          
          
    final RecordMetaData metaData = metaDataProvider.getRecordMetaData();
              final ReadTransaction tr = snapshot ? ensureContextActive().snapshot() : ensureContextActive();
              return SplitHelper.loadWithSplit(tr, context, recordsSubspace(),
                      primaryKey, metaData.isSplitLongRecords(), omitUnsplitRecordSuffix, sizeInfo);
          }
          
          
@Override
          @Nonnull
          public RecordCursor<FDBStoredRecord<Message>> scanRecords(@Nullable final Tuple low, @Nullable final Tuple high,
                                                                    @Nonnull final EndpointType lowEndpoint, @Nonnull final EndpointType highEndpoint,
                                                                    @Nullable byte[] continuation,
                                                                    @Nonnull ScanProperties scanProperties) {
              return scanTypedRecords(serializer, low, high, lowEndpoint, highEndpoint, continuation, scanProperties);
          }
          
          
@Nonnull
          public <M extends Message> RecordCursor<FDBStoredRecord<M>> scanTypedRecords(@Nonnull RecordSerializer<M> typedSerializer,
                                                                                       @Nullable final Tuple low, @Nullable final Tuple high,
                                                                                       @Nonnull final EndpointType lowEndpoint, @Nonnull final EndpointType highEndpoint,

In the current codebase, I don’t find any other method than FDBRecordStore#loadRawRecordAsync to return a FDBRawRecord.

My question: Is there anything in the API to return a cursor of FDBRawRecord-like skipping the deserialization?
Or in that case, do we have to fall back to the FDB Java API?

Our idea

If at the moment we have to fall back to the FDB Java API, do you think the FDB Record Layer team
would accept a PR to introduce a new method in FDBRecordStore with the following signature:

public RecordCursor<FDBRawRecord> scanRawRecords(@Nullable final Tuple low, @Nullable final Tuple high,
                                                 @Nonnull final EndpointType lowEndpoint, @Nonnull final EndpointType highEndpoint,
                                                 @Nullable byte[] continuation,
                                                 @Nonnull ScanProperties scanProperties)

In any case, thank you for the awesome work you did.

MMcM · July 20, 2021, 1:03am

I believe that you are right that there is nothing to do what you want today.

I cannot think of any objection to an issue or pull-request for adding a new scanRawRecords.

spullara · July 29, 2021, 4:57am

Seems like the right solution here would be to make a LazyFDBStoredRecord that deserializes on demand. Could be more generally useful. Also, still hoping for a scan that only returns keys over the wire unless that has been added since I last checked.

MMcM · July 29, 2021, 6:26am

To be useful, the new LazyFDBStoredRecord would need to additionally have the getRawRecord method. Which means either it has a strange contract where that stops working as soon as you call getRecord or memory consumption is effectively doubled so it can keep the original around, just in case, and those bytes won’t immediately be GC’ed after conversion as now.

The C API does not have just key scanning, no.

Topic		Replies	Views
Use the fdb-record-layer without using protobuf Record Layer	4	278	December 22, 2023
Record Layer query performance benchmarking against traditional RDBMS Record Layer	3	1513	April 22, 2020
How Record Layer Provides APIs to Handle Large Range Scan Longer Than 5 seconds Record Layer bindings	6	1783	December 16, 2020
Query practices with FDB Using FoundationDB	0	927	February 25, 2024
Record layer: deleteRecordsWhere Record Layer	5	246	June 3, 2024

scanRecords without deserialization

Related topics