FunctionKeyExpression is not honored when evaluating the QueryPlan

rahul-nitkkr · February 11, 2020, 2:48pm

FunctionKeyExpression don’t seem to be honored in the evaluation of a query plan.
We came across a test casewhich exposes this .
Is there any plan to make FunctionKeyExpressions allowed to be picked for indexes ?
Are there any workarounds for the same ?

scgray · February 11, 2020, 3:56pm

No, queries cannot currently use FunctionKeyExpressions. The main problem is that the they currently have to way to express the structure of the index keys that they produce in a manner that would be meaningful to the planner. For example, you could have a function that picks the third letter of every string field in the record and concatenates them to produce a fan-out index key — there planner has absolutely no insight into what is going on in the guts of the function to know how query arguments could be applied against the index.

There is work going on in the planner that I will let others chime in on that may pave a path to having fully exposed function based indexes.

What sort of index are you trying to create?

rahul-nitkkr · February 11, 2020, 4:18pm

Thank you for the insight @scgray.

Our use case is pretty straightforward, We’re trying to create Uppercase string indexes on a few fields of our data.

scgray · February 11, 2020, 4:25pm

Oh, ok, then you probably want to take a look at collating indexes. These are in the fdb-record-layer-icu package. Here is an example: https://github.com/FoundationDB/fdb-record-layer/blob/master/fdb-record-layer-icu/src/test/java/com/apple/foundationdb/record/icu/TextCollatorICUTest.java

rahul-nitkkr · February 11, 2020, 5:28pm

Ah ok! Let me take a look at that. I’ll come back with my findings.

MMcM · February 11, 2020, 7:08pm

First, I will second Scott’s recommendation that you try a collating index with case-insensitivity (3) strength. They are designed for just this kind of use.

That said, an index defined with a simple function, such as one that uppercases, on a field can in fact be used in a query. The restriction is that the entire key expression that defines the index must be used verbatim in the query with [in]equality predicates. And the FunctionKeyExpression implementation must be marked as a QueryableKeyExpression.

So, given UPPER(f), you can do the equivalent of `WHERE UPPER(f) >= ‘e’ AND UPPER(f) <= ‘m’ to scan such an index between those bounds.

What doesn’t work is planner support for “understanding” what the index does. An imaginary case of this would be something like WHERE UPPER(f >= 'e' AND f <= 'm').

I have created a pull request to fix the test that you found to demonstrate the simple case that does work today.

VibhutiD · March 3, 2020, 12:09pm

Hi Mike,

I have the same use-case. I am trying to create a uppercase testCollate index and referring to FDBCollateQueryTest class but my query is not behaving as expected. I’m not sure if my inference is correct.

Message Order {
         int32 order_id = 1;
         int32 price = 2;
         string buyer = 3;
}

I have inserted data with values of buyer : [“foo”, “bar”, “vibhuti”]. My use-case is that user queries buyer: “FOO” and it returns the order with value “foo” (case-insensitive search). For the above model, the index that I have created is:

KeyExpression NAME_FIELD = function(CollateFunctionKeyExpressionFactoryJRE.FUNCTION_NAME, field("buyer"));
builder.addIndex("order", "buyerIndex", NAME_FIELD );

And the query that I’m firing is (to make sure it picks up this index):

RecordQuery query = RecordQuery.newBuilder().setRecordType("Order").setFilter(Query.field("buyer").equalsValue("vibhuti")).setAllowedIndex("buyerIndex").setRequiredResults(field("buyer)).setSort(field("buyer")).build();

When I fire this query, it throws me an error stating “Cannot sort without appropriate index: Field { ’ buyer’ None}.” The index has already been registered (I have verified this with an IndexScan.). If I remove the setSort method, it does a full range scan instead.

Is the index I have constructed correct? If not, could you please guide me as to how to proceed further?
Thanks.

MMcM · March 3, 2020, 7:02pm

Your query and sort expressions must match the collating index; that is, they must involve the collate function expression.

Unlike in SQL, where COLLATE clauses affect the behavior of the column itself in indexes and queries, in the record layer that remains unaffected and applications of a collation function are indexed and queried.

Specifically, instead of Query.field("buyer") use Query.keyExpression(NAME_FIELD) and instead of Key.Expressions.field("buyer") again NAME_FIELD. Review the test class you referred to again and note how it does that.

Since there is an equality filter on the collated field, the sort won’t make any difference in this query. But I assume that it will in the actual use case.

VibhutiD · March 4, 2020, 6:45am

Ahh got it. Yes the use-case is more complex than this. It works now.
Thanks a ton.

Topic		Replies	Views
Extending the Record layer query and planner Record Layer	1	382	October 28, 2022
Mixed ascending/descending sort? Record Layer	1	1067	August 12, 2019
Multiple questions about Indexes, functions and watches to implement etcd-layer Record Layer	7	3051	June 3, 2020
A filter similar in sql to `where (a,b,c) > (1,2,3)` Using Layers	2	186	May 10, 2024
Full range scan performed in sort when not required Record Layer	3	982	March 3, 2020

FunctionKeyExpression is not honored when evaluating the QueryPlan

Related topics