Hi there, I’ve been working on a simple spark adapter on top of record layer.
Spark has a notion of pushing down filters to the underlying datastore. This would be a good fit for an index scan or primary key scan vs doing a full table scan with client side (spark client) filtering. However, part of determining how that query looks to spark, is dividing up the work into partitions. Now for the primary key part that is easy, basically use the
getPrimaryKeyBoundaries function, however for an index I don’t think it would really matter, right? Indexes aren’t materialized, correct, they are just a pointer to the primary key?
Thanks for the help.