Record Layer `Field` proto message

When studying record layer code, I noticed that in record_metadata.proto, the proto message Field is defined as follows.

message Field {
  required string field_name = 1;

  // [...]
  }
}

In protobuf, the way to uniquely identify a field/column is by using field number. I tried to understand the implications of using field name, and found the following comment in SchemaEvolution.md.

However, index and primary key definitions in the meta-data itself are based on field names, so these will also need to be updated in exactly the same way. At present, to avoid this appearing to be a change requiring an index rebuild, the index’s subspace key and version need to be copied from the old definition.

I was wondering what was the motivation for using field name instead of field number? I am trying to understand the pros and cons.

For my implementation, I am considering something as follows instead.

message Field {
  required uint32 field_number = 1;

  // [...]
  }
}

Happy Holidays! :slight_smile:

1 Like

One reason to use name instead of number is that you can have an index definition (or common primary key) that spans across types as long as all types have that field defined on them. So, for example, you could have ChildType1 and ChildType2 that both had a parent_id field that pointed to some ParentType by its primary key (say), and then you could define a multi-type index on ChildType1 and ChildType2 on the parent_id field that would let you query for all children (of both types) for a given parent.

Now, you could do the same thing with field numbers instead of field names, but I think there are more cases where the same field name gets used across types to represent the same information rather than number (or, perhaps, it’s easier to retroactively change a protobuf definition so that the same names are used for the same data across messages are the same rather than numbers, which can’t be changed after the fact).

But that is a bit niche and for single-type indexes, it doesn’t really buy you too much (other than maybe being slightly easier for humans to look at and understand, though that can be fixed by having whatever user interface the human is looking through look up the field name by number), and using the field number instead would be fine for most purposes, I think.

Thanks @alloc for the reply!