What's so special about proto3 and ordinal 0 for enums?

Description

We are using proto3 to serialize our records and we wanted to create an
index based on an enum.
Unfortunately, it seems the ordinal 0 cannot be indexed at all and is ignored.
I cannot find anything on the documentation but according to the tests it seems
to be something “expected” or at least known.

As proto3 requires a 0 for an enum, I end up with the following definition.

enum Payment {
    INVALID  = 0; // proto3 requires an enum to start with 0 but it's an invalid value for FDB Record Layer
    PREPAID  = 1;
    POSTPAID = 2;
    WIRED    = 3;
}

And then, after inserting a bunch of data of all those types, I end up with the following keys.
Everything created as INVALID hasn’t be indexed at all.

fdb> clearrange \x00 \xff
Committed (44052630238)
fdb> getrange \x00 \xff 1000

Range limited to 1000 keys
`\x15\x01\x02demostore\x00\x15\x02\x02account_payment_count\x00\x15\x01' is `\x01\x00\x00\x00\x00\x00\x00\x00'
`\x15\x01\x02demostore\x00\x15\x02\x02account_payment_count\x00\x15\x02' is `\x01\x00\x00\x00\x00\x00\x00\x00'
`\x15\x01\x02demostore\x00\x15\x02\x02account_payment_count\x00\x15\x03' is `\x03\x00\x00\x00\x00\x00\x00\x00'

Questions

  • Is the behaviour expected?
  • Am I reaching the wrong conclusions here?
1 Like

The basic answer is that this is an unfortunate artifact of how the Record Layer’s type system and proto3 field nullability interact poorly. More details here: FoundationDB Record Layer Overview

But the short answer here is that proto3 does not allow one to distinguish between a scalar field value set to its default value from a value that is not set when we first developed the Record Layer, which for enums, means the value associated with ordinal 0. So, assuming you have something like:

message Account {
    Payment payment = 1;
}

So, if you then try and evaluate a key expression like field("payment") on a message where the value is set to INVALID, then what the Record Layer does is it asks the message if the status is set (via the hasField method on Message), which returns false. At that point, the Record Layer chooses to interpret that field as unset, and so the key expression returns null.

What index type are you using? I’d expect this behavior if you were using COUNT_NOT_NULL grouped by the field("payment"), though I’d expect there to be an entry for INVALID if the type was COUNT, though the corresponding index entry would correspond to payment IS NULL instead of payment = INVALID. That would look like this key:

`\x15\x01\x02demostore\x00\x15\x02\x02account_payment_count\x00\x00' is `\x01\x00\x00\x00\x00\x00\x00\x00'

To get around this, there are a few things you could do. One would be to define your index with a non-default NullStandin: for example, field("payment", FanType.scalar, NullStandin.NOT_NULL) will choose to always interpret an unset value as the default value, which would mean you’d start indexing INVALID instead of null. Another would be to use a dummy value as enum-value 0. So something like:

enum Payment {
    UNSET = 0; // unused
    INVALID = 1;
    PREPAID = 2;
    POSTPID = 3;
    WIRED = 4;
}

Then anything where payment = UNSET would be treated just like INVALID payments today, but that (special) value is explicitly supposed to correspond to a null value.

You could also (I think) try a message with the field marked as explicitly optional:

message Account {
    optional Payment payment = 1;
}

I think that would result in hasField now actually being able to tell if the payment field had been set to INVALID (versus unset). However, we haven’t really verified that behavior with optional proto3 fields (yet), I don’t think.

Note that some of this is historical: when we first added proto3 support, proto3 did not have optional fields. (That feature existed in proto2 but was removed in early versions of proto3, but added back with version 3.9.) So, if we wanted to have some way of expressing nulls, it seemed to make sense to default to treating default-or-unset fields as unset, and then give users a way to opt in to treating them as default. But now that optional is back, I think we’d kind of like to reverse that decision, and treat not-optional default-or-unset fields as their default value, and then a user who wants null-ability can mark the field as optional. However, we’d need to carefully roll out that change to avoid breaking existing data, as it would be backwards incompatible.

2 Likes

Thank you for this very detailed answer.

What index type are you using? I’d expect this behavior if you were using COUNT_NOT_NULL grouped by the field("payment") , though I’d expect there to be an entry for INVALID if the type was COUNT , though the corresponding index entry would correspond to payment IS NULL instead of payment = INVALID . That would look like this key:

Definitely! We are using a COUNT_NOT_NULL index. As you mentioned, using COUNT results in having null (\x00) encoded.

To get around this, there are a few things you could do. One would be to define your index with a non-default NullStandin : for example, field("payment", FanType.scalar, NullStandin.NOT_NULL) will choose to always interpret an unset value as the default value, which would mean you’d start indexing INVALID instead of null .

This seems a really good alternative regarding enums values, that’s unlikely you want to consider 0 as being null here.

It seems there are three alternatives then:

  • Use NullStandin.NOT_NULL to ensure null is never inserted
  • Insert a dummy value on the enum which is supposed to map to null
  • Handled the nullability on the top-level type to not have to handle 0

If I understand correctly, the latter won’t make it really convenient the indexing.
Anyway, I think the two first propositions are good enough.

Thanks a lot for your time.