Sadly, there are no design docs, yet. I will try to answer your questions here in multiple posts.
Internal Key Management
Document Layer stores each field of JSON document as a separate FDB key. So, a single document would be stored under multiple FDB keys. Mapping a JSON field to FDB key should be deterministic and maintain the primary key order.
_id is a mandatory field and acts as the primary key. It should be possible to get any document for a given
_id and perform range scans on it. Precisely, for this reason, the Document Layer keeps the value of
_id as part of the FDB key.
FDB key for a JSON field is formed as
<collection directory prefix> : <primary key> : <field name>
This can be best explained with an example. Let’s assume we have a collection called
employees and a sample document from that collection is as follows
_id : 1,
name : "Eric",
title : "Engineer",
salary : 10000
For this example, the Document Layer stores FDB keys as follows
0x150b:1:name -> Eric
0x150b:1:title -> Engineer
0x150b:1:salary -> 10000
0x150b is the directory prefix for
employees collection. The Document Layer creates a separate directory prefix for each collection.
1 is the value of
_id for this document. To read all the documents in the collection, Document Layer issues
getRange(0x150b) on FDB. And, if the Document Layer needs to read a document with
_id as 5, it issues
getRange(0x150b:5). To access just one field, it sends a point get.
You can imagine an alternative design where the entire document is stored under a single FDB key. This way we can reduce the number of keys to read for a document. Also reduces the key duplication as every field has to store the entire key again. There are three reasons why this may not be a good fit for the Document Layer.
- Size of the document would be limited to FDB value size (100KB). You can make the scheme bit dynamic by storing the entire document in a key if the document is small and fall back to key per field approach if the document is larger than 100KB.
- Any change to a single field needs to rewrite the entire document.
- With the document data model, it is very common to have embedded documents and arrays, which makes very complicated dot notation. With the key per field scheme (the scheme we are using now), the FDB key is deterministic and it can be easily implemented from Query planner. That becomes very complicated with key per document scheme.
In the future, we are considering have a per collection configuration to set key scheme. That way, collections with small documents can benefit with key per document scheme. Once we support schema collections, this becomes an even more interesting feature.
I will try to send another post about the key format for secondary indices, tomorrow.