We have a custom use case, for which we implemented our own layer in java. But we concluded that it was more or less same as record layer and we are now evaluating record layer.
The first hurdle we have hit is that bulk migration (terabytes) of data via Record Layer is really slow.
Here are details of calls taking a lot of time:
In our own implementation, we initialized a table without any index, migrated data fully and then created indexes later on (if required).
In our implementation, population of a table without any indexes only performed set() operation and no get() operation. We relied on the fact that our (custom) saveRecord() implementation will override the previous key (if present).
But in record Layer, saveRecord() API performs get() to get oldRecord and then set() the new one. Because there are extra operations, we are now seeing slowness in migration and sometimes “transaction too old” error. In my short experience, a transaction with only set() operation is super fast and reliable.
Looking at the the saveRecord() API of record layer, even when flag existenceCheck is set as NONE, we fetch oldRecord for 2 reasons:
(1) to get splitting logic of oldRecord so that we can effectively clean it before inserting newRecord:
(2) to clear indexes corresponding to oldRecord:
For #1, if metaData.isSplitLongRecords() == false, we can safely pass previousSizeInfo as null in saveWithSplit() API.
For #2, if we don’t have any secondary index on recordStore (which we get from metadata cache), we don’t need to call updateSecondaryIndexes() at all.
Is this understanding correct?
If yes, does this mean that we can avoid getting oldRecord for case when:
- existenceCheck == NONE
- metaData.isSplitLongRecords() == false
- no secondary indexes?
Are there lot of other get() calls we are doing in this workflow. If there are many other get() calls we do anyway, above optimizations might not be so useful.
Some more info:
In our custom implementation, we implemented record splitting logic as [ (key,1) -> Split1 ] format. So to cleanup a key, we did clear(range(key)), which is still better then get() operation (based on my short experience). Right now, I am not sure about the splitting logic in record layer which required oldRecord to be fetched to clear it.