Sadly, there are no design docs, yet. I will try to answer your questions here in multiple posts.
Internal Key Management
Document Layer stores each field of JSON document as a separate FDB key. So, a single document would be stored under multiple FDB keys. Mapping a JSON field to FDB key should be deterministic and maintain the primary key order.
In Mongo, _id
is a mandatory field and acts as the primary key. It should be possible to get any document for a given _id
and perform range scans on it. Precisely, for this reason, the Document Layer keeps the value of _id
as part of the FDB key.
FDB key for a JSON field is formed as
<collection directory prefix> : <primary key> : <field name>
This can be best explained with an example. Let’s assume we have a collection called employees
and a sample document from that collection is as follows
{
_id : 1,
name : "Eric",
title : "Engineer",
salary : 10000
}
For this example, the Document Layer stores FDB keys as follows
0x150b:1:name -> Eric
0x150b:1:title -> Engineer
0x150b:1:salary -> 10000
0x150b
is the directory prefix for employees
collection. The Document Layer creates a separate directory prefix for each collection. 1
is the value of _id
for this document. To read all the documents in the collection, Document Layer issues getRange(0x150b)
on FDB. And, if the Document Layer needs to read a document with _id
as 5, it issues getRange(0x150b:5)
. To access just one field, it sends a point get.
You can imagine an alternative design where the entire document is stored under a single FDB key. This way we can reduce the number of keys to read for a document. Also reduces the key duplication as every field has to store the entire key again. There are three reasons why this may not be a good fit for the Document Layer.
- Size of the document would be limited to FDB value size (100KB). You can make the scheme bit dynamic by storing the entire document in a key if the document is small and fall back to key per field approach if the document is larger than 100KB.
- Any change to a single field needs to rewrite the entire document.
- With the document data model, it is very common to have embedded documents and arrays, which makes very complicated dot notation. With the key per field scheme (the scheme we are using now), the FDB key is deterministic and it can be easily implemented from Query planner. That becomes very complicated with key per document scheme.
In the future, we are considering have a per collection configuration to set key scheme. That way, collections with small documents can benefit with key per document scheme. Once we support schema collections, this becomes an even more interesting feature.
I will try to send another post about the key format for secondary indices, tomorrow.