FoundationDB

Looking for layer design documentation


(Sebastian Weyrauch) #1

Hi, is there some high level design overview for the Document Layer?

I’m looking for inspiration for a multi model data store and I’d be interested especially in document to KV data mapping, internal key management, meta data handling, evasion of max transaction size for bigger documents (Mongo limit is 16MB), indexing patterns…

Of course I already browsed the source, but I’m a Java guy and comprehending C++ code is nothing I can do on the fly.


(Bhaskar Muppana) #2

Sadly, there are no design docs, yet. I will try to answer your questions here in multiple posts.

Internal Key Management

Document Layer stores each field of JSON document as a separate FDB key. So, a single document would be stored under multiple FDB keys. Mapping a JSON field to FDB key should be deterministic and maintain the primary key order.

In Mongo, _id is a mandatory field and acts as the primary key. It should be possible to get any document for a given _id and perform range scans on it. Precisely, for this reason, the Document Layer keeps the value of _id as part of the FDB key.

FDB key for a JSON field is formed as

<collection directory prefix> : <primary key> : <field name>

This can be best explained with an example. Let’s assume we have a collection called employees and a sample document from that collection is as follows

{
   _id    :     1,
   name   :     "Eric",
   title  :     "Engineer",
   salary :     10000
}

For this example, the Document Layer stores FDB keys as follows

0x150b:1:name -> Eric
0x150b:1:title -> Engineer
0x150b:1:salary -> 10000

0x150b is the directory prefix for employees collection. The Document Layer creates a separate directory prefix for each collection. 1 is the value of _id for this document. To read all the documents in the collection, Document Layer issues getRange(0x150b) on FDB. And, if the Document Layer needs to read a document with _id as 5, it issues getRange(0x150b:5). To access just one field, it sends a point get.

You can imagine an alternative design where the entire document is stored under a single FDB key. This way we can reduce the number of keys to read for a document. Also reduces the key duplication as every field has to store the entire key again. There are three reasons why this may not be a good fit for the Document Layer.

  • Size of the document would be limited to FDB value size (100KB). You can make the scheme bit dynamic by storing the entire document in a key if the document is small and fall back to key per field approach if the document is larger than 100KB.
  • Any change to a single field needs to rewrite the entire document.
  • With the document data model, it is very common to have embedded documents and arrays, which makes very complicated dot notation. With the key per field scheme (the scheme we are using now), the FDB key is deterministic and it can be easily implemented from Query planner. That becomes very complicated with key per document scheme.

In the future, we are considering have a per collection configuration to set key scheme. That way, collections with small documents can benefit with key per document scheme. Once we support schema collections, this becomes an even more interesting feature.

I will try to send another post about the key format for secondary indices, tomorrow.


(Sebastian Weyrauch) #3

Concerning mapping this is the obvious first solution but it leaves some shortcomings

  • document order is not maintained - fine with JSON but not with f.e. XML
  • duplicate names won’t work - this is discouraged but still not illegal JSON use
  • beside the 10MB transcaction (= document?) size limit it imposes a 100KB limit per value

Thanks anyway! Looking forward to coming posts…


(Bhaskar Muppana) #4

I assume you are talking about the order of fields within a document. Order of documents within a collection is maintained by their primary key. Insertion order of fields within a document is not maintained. Fields are ordered lexicographically. Document Layer is designed specifically to store JSON documents only. XML was never a goal.

Yeah, the Document Layer doesn’t allow duplicate field names. Not sure, if that’s a desired feature. How would updates actually work with duplicate field names? Isn’t it better to use Arrays? I believe MongoDB doesn’t support duplicate fields as well.

This is one of the limitations we want to fix soon. Here is the issue on it.


(Bhaskar Muppana) #5

Secondary indices

Based on primary index key mapping it is easy to guess the key mapping for secondary indices.

<index prefix> : <index key> : <primary key>

For the sample JSON document, we have in the previous post, if there is a secondary index on the field name - index entry would look like

0x15ab:name_1:Eric:1  ->  null

Document Layer keeps the primary key in the FDB key to deal with duplicate index keys. Document Layer doesn’t store anything in the value for indices. So, a query that’s using secondary indices needs to dereference the document using the primary key found in the index entry. This can be optimized by providing covered indices, which store either entire or part of the document as a value in the index entry. We don’t have covered indices yet.

Size Limitations

As you mentioned, transaction size limits the size of the document as we try to insert the complete document in a single transaction. Even though transaction size can be as big as 10MB, I think it’s better to limit the document size to 1MB to be safe. This could be a limitation for some use cases for sure, hope is majority use cases would have smaller documents and large objects would keep the data in some kind of blob store.

Note: It is always possible to work around these limitations by splitting the insert of a document into insert and update. But, that may not be a desirable solution for applications.

FDB Transactions

Every request on the Document Layer uses one or more FDB transactions to read/update state in FDB keyspace. Running entire request under a single transaction gives us better isolation guarantees. But this may not be possible based on the kind of request, considering FDB transactions have 5-second limitation.

Document Layer enforces certain operations to run in a single transaction. Metadata updates and inserts fall into those categories. If it can’t finish in a single transaction it sends an error back to the application. Queries with predicates work differently. As a predicate can match the entire collection, if the operation takes too long then it will divide the work into multiple transactions.

Metadata management

Collection metadata is maintained as FDB keys. A metadata version is maintained for each collection. It gets updated every time there is a change to metadata like new index created. Document Layer caches metadata. Metadata version is used to invalidate cached metadata. Every operation on the collection would first read metadata version to make sure it is same as the cached data. If it is not the same then it will get the latest metadata. This is a very simple approach. Even though it avoids reading metadata, again and again, every operation has to read metadata version. This will make version key hot. There are multiple optimizations we can apply to reduce the hotness also to avoid the latency hit. I will raise issues on them soon with details.

Hopefully, I answered all your questions. This project is just getting started. It can use a lot of optimizations. And of course, features. We are happy to discuss any ideas you might have. PRs are always welcome :slight_smile:.


(Amirouche) #6

I am into multi-model database for sometime now it started 5 years ago when I wanted to the flexibility of graphdb with the power of lucene. Of course, it did not work out well until I discovered wiredtiger and later FDB. I am still trying to put together my ideas. Ramblings ahead.

What I had in mind in the beginning is bolt together Document, Graph and KV serialized values with msgpack and an Inverted Index for full text search like arangodb does. But it doesn’t work. Mostly the issue of foreign keys (or references if you prefer) basically you can not cascade delete / update easily without putting together a facade on top of it like mongodb does ie. you have a constrained abstraction aka. the document inside collections and declare indices for each collection which are stored as collection’s metadata. Indices have different behaviours aka. implement various indexing schemes and the layer takes the responsibility to cascade delete / update indices when a document changes via observers I guess.

I am trying something else. Instead of a facade or an abstraction that doesn’t leak. I am trying to code “insides out” where a minimal understanding of actual how FDB works to be able to use (and extend) the layer(s) I am working on. Why? Because, AFAIK with a closed abstraction like the Document layer, you can not tap into the lower level FDB KV space to grab extra performance boost. For instance, the layer I am working on preserves lexicographic ordering.

Mind the fact that the Document layer is still useful if you want to scale an existing mongodb project. I am in a different situation where I am coding from scratch.

Contrary to what you wrote, the design decision made in the Document layer is not the most obvious and actually I don’t why that choice was made. My initial design was using msgpack to store the JSON documents.

What do you mean by “internal key”?

You can use a subspace to store all metadata about your layer like the Directory layer does. Also, FWIW, I don’t use the Directory layer. It seems too dangerous.. Instead I use an integer from an Enum as a key prefix.

See Object store on FoundationDB and Missing information in documentation about how to store blobs

See:

You might also take advantage of the fact that each record key is unique or otherwise said you might not need to use the value part.


(Amirouche) #7