Range scan not being truncated by primary key filter when an additional Or component present in And component

Hi,

For records of type Order as defined below, I want to be able to truncate the total sorted range of records scanned by passing a filter on the primary key but record layer doesn’t seem to allow that when an additional Or component is present in the query as it converts the filter to a disjunctive normal form.

message Order {
    int64 order_id = 1;
    int32 price = 2;
}

The following are the query passed and plan generated for the case when there is no Or component within the And component.

RecordQuery.newBuilder()
    .setRecordType("Order")
    .setFilter(Query.and(
        Query.field("order_id").greaterThanOrEquals(101L),
        Query.field("order_id").lessThanOrEquals(102L)))
    .setSort(Key.Expressions.field("order_id"),false)
    .build();

Query Plan: Scan([[101],>) | [Order] | order_id LESS_THAN_OR_EQUALS 102

The following are the query passed and plan generated for the case when there is an additional Or component within the And component.

RecordQuery query = RecordQuery.newBuilder()
    .setRecordType("Order")
    .setFilter(Query.and(
        Query.field("order_id").greaterThanOrEquals(101L),
        Query.field("order_id").lessThanOrEquals(102L),           
        Query.or(Query.field("price").equalsValue(2), 
                 Query.field("price").equalsValue(2))))
        .setSort(Key.Expressions.field("order_id"),true)
        .build();

Query Plan: Scan(<,>) | [Order] | Or([And([order_id LESS_THAN_OR_EQUALS 102, order_id GREATER_THAN_OR_EQUALS 101, price EQUALS 2]), And([order_id LESS_THAN_OR_EQUALS 102, order_id GREATER_THAN_OR_EQUALS 101, price EQUALS 2])])

Note that in the first query only the greaterThanOrEquals filter on order_id is used to truncate the scan (when the order of filters is switched and lessThanOrEquals filter on order_id is specified before then the lessThanOrEquals filter is used).
Further, in the second query, none of the filters on order_id are used to truncate the scan range. This makes the query slow.
Any thoughts/suggestions on how I can counter this?

That does sound like a bug. In general, it should be able to collapse filters like that, but sometimes it has trouble. See: Planner does not currently coalesce overlapping filters · Issue #1 · FoundationDB/fdb-record-layer · GitHub

You are correct that we convert to DNF, and that does have some implications in the kind of plans we’d produce. In particular, I think a slightly more sophisticated version of the planner would be likely to produce something that is:

(Scan([[101],[102]]) | price EQUALS_VALUE 2) ⋃ (Scan([[101],[102]]) | price EQUALS_VALUE 2)

and not notice that both sides of the union are the same. Or likewise, if the or were something like Query.or(Query.field("price").equalsValue(0), Query.field("price").equalsValue(2)), it might plan it as:

(Scan([[101],[102]]) | price EQUALS_VALUE 0) ⋃ (Scan([[101],[102]]) | price EQUALS_VALUE 2)

Instead of the more efficient:

Scan([[101],[102]]) | OR(price EQUALS_VALUE 0, price EQUALS_VALUE 2)

I think our approach (with the new planner) would be to continue doing DNF normalization, and then optimizing the resulting structure, e.g., noticing that the left side of the union is common and that the first can be turned into the second. I could be wrong though.

But putting DNF aside, I think our or handling in general isn’t perfect. This isn’t the most user friendly approach, but it might be easier to execute the query more directly. Something like:

recordStore.scanRecords(Tuple.from(101), Tuple.from(102), TupleRange.RANGE_INCLUSIVE,  TupleRange.RANGE_INCLUSIVE, null /* for continuation not right if resuming from previous execution */, ScanProperties.REVERSE_SCAN)
    .filter(rec -> Query.or(Query.field("price").equalsValue(2), Query.field("price").equalsValue(2)).eval(recordStore, EvaluationContext.EMPTY, rec))

Or something like that. You could also, if you’re extra brave, construct the optimal RecordQueryPlan from plan elements, though therein lies dragons.

Thanks @alloc for the insight.

I think I’ll go forward with the scanRecords implementation for now.