Question about FDB Record Layer's SIGMOD'19 paper

Hi, I’m new to FDB Record layer and decide to use it for fun. After I read the exciting sigmod’19 paper. I have some questions:

  1. When building a universe index, we need to scan all the records stored in a record store and apply the given key expression to them. I wonder how does the record layer determine the type of binary value stored in FDB. In other words, how does the key expression maps the string field name to actual values in the binary value?

  2. Is the version of each record stored adjacent to the record itself?

  3. According to "Simple requests for individual records are comparatively expensive, with an average of ∼13.3 key read of which ∼7.7 are not for record data. ", does it mean the simple requests get roughly 6 records on average and 6 key reads for their versions and the rest key reads are for other metadata in record layer?

Thanks,
Zhiyuan Dong

Very cool you found the paper exciting and are taking a look at it! I’ve taken a stab at answering those questions below, but feel free to ask any clarifying questions if things are unclear.

It’s possible I’m misunderstanding the question, but the basic mechanism for a key expression to extract data from a record is that each key expression specifies the names of all fields (and their order, but not as important for this question) needed in the index or primary key (or whatever the expression is specifying). In the RecordMetaData, a full Protobuf definition for each record type is stored, and then from that, we can use primitives in Protobuf to extract each field by name from the Protobuf record. (The binary format only includes field numbers, not field names, but the Protobuf definition specifies how to match up names and numbers.) The Protobuf record type definition also includes field type information, though we don’t use that directly (instead relying on Protobuf to make the correct type casts and data serialization/deserialization decisions).

If that wasn’t what you were asking about, though, let me know and I can try again.

Yeah, that’s right. Each record is stored using 1 or more FDB keys, with the value of each key being the serialized Protobuf data, possibly split across multiple values if the record is large. We then store the version at a special key at a special key that’s guaranteed to be the first key of a split record. (Essentially, the record will be stored using keys that are suffixed with (Tuple-encoded) numbers 0 through n, and then we store the version of the record at a key suffixed with -1.)

The wording on that experiment is a little strange, but requests can ask for more than one record at a time, and so I believe that you’re basically right that this is implying that there are on average 6 records per request, though there are a few ways that this might not quite be right. I think versions actually do count as record data, and also, large records (>100 kB) are split across multiple keys, so you might have more than 1 FDB key per record. (So, a dataset of non-versioned 10 kB records where every request reads 6 records and a different dataset of versioned 500 kB records where every request reads 1 record would read 6 keys of “record data” per request.)

What doesn’t count as “record data” would be the other meta-data in the Record Layer: for example, every store has a key stating what version of the meta-data that store last used, and that wouldn’t count, and each store also contains a list of which indexes haven’t been built yet, etc. And those are the kinds of things being counted in that 7.7 number (which sounds a bit high to me–I think the theoretical number is something like 1 per transaction in an ideal system).

Thanks for your quick and patient response!

Sorry for my first unclear question. I would like to make it more specific.

As far as I know, the encoded protobuf message does not contain the mapping from field numbers to field name. If we want to map the names in the key expression to actual values, we need a protobuf definition (descriptor). Then my question is that:

When we store a binary value (encoded protobuf message) into FDB, do we encode some type information to describe which Protobuf definition should be used for deserialization, such as type_id or type_definition_id in Cloudkit VLDB’18 paper? Otherwise, how do we know which record type does a given protobuf message belong to? :thinking:

Thanks,
Zhiyuan Dong

Ah, I see. Now I think I understand.

For a given RecordMetaData’s Protobuf record definition, there is exactly one record type, known as the “union descriptor”, that has one field for each top level record type in the database (see: https://foundationdb.github.io/fdb-record-layer/GettingStarted.html#protobuf-configuration). So, something like:

message RecordTypeOne { }
message RecordTypeTwo { }

message UnionDescriptor {
   option (com.apple.foundationdb.record.record).usage = UNION;
   RecordTypeOne _RecordTypeOne = 1;
   RecordTypeTwo _RecordTypeTwo = 2;
}

In this case, there are two types of records that the user may insert into a record store (RecordTypeOne and RecordTypeTwo), and the UnionDescriptor message is essentially “listing” those types.

The Record Layer then wraps all records in union descriptor messages, essentially treating it as a oneof. When data are serialized, the user hands a record to the Record Layer, and the Record Layer creates a union descriptor message and sets the appropriate field to the supplied record. Then it serializes that union descriptor message to the database. When it deserializes a record, it always first deserializes it into an instance of the union descriptor, and then it searches to see which field in the union descriptor is set. From there, it knows what type the record is from the union descriptor definition.

So, even though the Record Layer allows the user to save a heterogenous set of records of various types into the database, they are always stored using as a homogenous set of union descriptors, and so the same Protobuf deserializer can be used for every record in the store.

(Note that because of how Protobuf serialization works, this essentially means that every record is assigned a “type number”, which is the field number of that record in the union descriptor message. Then the type number (and, less relevantly, the message length) is then stored with the message data, so an alternative approach would have been to have had each record type definition just take a type number, and then we could have concatenated the type number with the serialized Protobuf data for that type. We didn’t do that, partially because it’s easier to just let Protobuf handle that for us, but we did take advantage of the fact that there’s already a record type number when the recordType() key expression was created. That expression evaluates to some value that is guaranteed to be unique per record type (for a given meta-data definition), and by default, it’s the union descriptor field number for that record.)

(Also left out of this analysis is that in reality, you are allowed to have multiple fields in the union descriptor of the same type. At serialization time, the same field is consistently chosen based on some internal logic, and at deserialization time, it will look to see if any field is set and take the type from that field. But you can use this logic to do things like create a new field in the union descriptor every time you edit the record type, and then the field that’s used on any given record is essentially a record of what version of the record type was current at the time the record was written.)

Ah, I see.

I really appreciate your detailed reponses!!!