Document Layer Performance

performance
#1

The Document Layer looks very promising, however, is there a performance overhead to using it?

  1. What are some real-world performance expectations? (operations/sec, latency, CPU usage)

  2. How much overhead does the Document Layer produce compared to the standard FoundationDB API?

Any suggestions, advice or experiences would be greatly appreciated. Thanks!

(Bhaskar Muppana) #2

Sorry for the delayed response. I keep procrastinating on this.

We don’t have any official benchmarking numbers yet. We have done some performance tests for specific use cases, which is hard to reason without digging into the use case. But, I can try to answer the second question. When it comes to layers overhead is the most important aspect. As the FoundationDB performance, in general, better understood and something we can rely upon.

For the sake of completeness lets just state a couple of known facts about Document Layer

  • Document Layer is a stateless layer and keeps all the state in FoundationDB.
  • Document Layer is MongoDB compatible, so it receives MongoDB queries and converts them into FoundationDB operations.

If we know the cost of FoundationDB operations and the mapping of MongoDB requests to FoundationDB operations, that would give us an idea about the cost/overhead of Document Layer.

FoundationDB operations

  • getReadVersion() - This is usually couple of milliseconds
  • commit() - Couple of milliseconds, bit more expensive than getReadVersion()
  • get() - sub-millisecond
  • getRange()
  • set() or clear() - insignificant, as its just in memory. We can just ignore these operations when it comes to performance, although it’s quite easy to abuse these operations and kill the cluster, let’s just assume we are not hitting those cases.

Every FoundationDB transaction will have to do at least getReadVersion() and commit(). Of course, if the transaction is read-only you wouldn’t do commit. It’s also possible to avoid getting a new read version for every transaction by reusing previously committed or cached read version. We have to compromise on causal consistency for this.

MongoDB - FoundationDB mapping

There are two aspects of the mapping that affect the performance mainly

  • How MongoDB requests are executed with the FoundationDB transactions?
    • This decides how often we have to pay the transaction overhead - getReadVerson() and commit(). This also decides the consistency guarantees Document Layer provides.
  • How documents are stored on FoundationDB keys?
    • This decides how many keys we have to get to respond to MongoDB queries

To keep the discussion focused, let’s stick to CRUD operations only.

MongoDB requests to FoundationDB transactions

To provide stronger consistency guarantees Document Layer tries to complete a request within a single FDB transaction. But, considering limitations on FDB transactions on the size and duration, it’s not possible to complete the request in a single transaction always. Different requests are handled differently

  • Schema update - Like create or delete indexes/collections. Forced to complete in a single transaction to maintain consistency of the schema.
  • Insert - Each insert request is forced to complete in a single transaction. This includes updating indexes as well if any exist. Even bulk inserts are forced to complete in a single transaction.
  • Update - Depending on the filter it is possible a single update request might need to update too many documents (even entire collection). If an update takes longer than 3 seconds, Document Layer spits into multiple FoundationDB transactions.
  • Deletes - same as updates

JSON documents to FoundationDB Keys

Each JSON field is stored in a single FoundationDB key. If a document has 10 fields, it is stored in 10 FoundationDB keys. Inserting that document would cause 10 sets on FDB, and reading the entire document causes a getRange() which fetches 10 FDB keys. For small enough documents, getRange() wouldn’t be that expensive compared to get(). But, if your document gets quite big (in terms of the number of fields), it could get expensive.

What does it mean?

If you do a simple insert

db.coll.insert({'_id': 'Bhaskar', 'section': 'A', 'marks': 90})

That would generate

  • getReadVersion()
  • Read schema version
  • set('Bhaskar:section', 'A')
  • set('Bhaskar:marks', 90)
  • commit()

One surprise here is schema version, Document layer maintains the schema in FoundationDB, and makes sure schema hasn’t changed since it last read on each and every request by reading the schema version. It’s just another key in FDB.

It would be around 5 to 7ms. These are very approximate numbers, it depends a lot on your FDB cluster setup.

Reads would do better than this

db.coll.find({'_id': 'Bhaskar')
  • getReadVersion()
  • Read schema version
  • getRange('Bhaskar:')

Techniques to mitigate latencies

  • Do batch operations - If you do bulk inserts, that would help to mitigate the transaction overhead
  • Avoid reading the schema version - Improvements in 6.1 FoundationDB make it possible to avoid reading schema version separately
  • Explicit transactions - You can’t batch read-modify-write operations, explicit transactions would make it possible by giving control over transactions to the application.