1 M rows write via Mongo client using document layer

vaggarwal · February 19, 2019, 9:32pm

1 M rows write via Mongo client using document layer is very slow comparatively loading it in mongodb database . I got 26 minutes versus 7 minutes

Here is the document-

for(var i = 1; i &lt;= 1000000 ; i++) {
      db.mycollections.insert({
          "_id" : i,
          "item" : "canvas" + i, 
          "qty" : 100 + i,
          "tags" : ["cotton"],
          "title" : "How do I create manual workload i.e., Bulk inserts to Collection ",
          " Iteration no:" : i
      });
}

Any changes is needed for this to run fast

mbhaskar · February 19, 2019, 9:52pm

Document Layer creates a separate FoundationDB transaction for each and every request and then commits. This gives strong consistency guarantees for the data. At the same time, each transaction has some overhead, like getting read-version and transaction commit. On top of it, every request needs to get metadata version to make sure metadata hasn’t changed.

One easy way to get better performance would be using bulk inserts. You can use insert_many() and insert multiple documents in each request. That should help with the performance.

SteavedHams · February 19, 2019, 11:35pm

An alternative to inserting many documents per transaction would be to do many single-document insert transactions in parallel, which will hide the latency of starting and committing transactions. You wouldn’t want infinite parallelism, but a pattern where, for example, you dispatch N inserts initially and then dispatch a new insert each time a dispatched insert completes should work well. I’m not sure what the ideal value of N is, at least hundreds and possibly low thousands.

vaggarwal · February 20, 2019, 1:41am

Thanks for the quick response , Will give it a shot

vaggarwal · February 20, 2019, 2:09am

I tried could able to improve write performance but not comparable to loading it in mongodb.

mbhaskar · February 20, 2019, 6:15pm

It depends on a lot of factors. Some related to your deployment configuration, some related to design traits of Document Layer and in general FoundationDB. I will try to explain top bullets

Write concern

Write concern you are running with will have a huge impact on MongoDB’s performance and consistency guarantees.

MongoDB: If you don’t specify any, defaults would be {w: 1, j: false}. That config only guarantees in-memory writes on standalone or primary. For stronger consistency, you would need something like {w: majority, j: true}. That would make sure the majority of replicas have your write, persistent on disk. It is still eventual consistency. And in case of failures, your writes could be rollbacked. This is how writes work on standard MongoDB.
Document Layer: Document Layer, and in general FoundationDB, goal is to provide strong consistency we are used to in SQL world. Document Layer just ignores the write concern provided by the client. Irrespective of the write concern, writes are always done at full consistency. Once the write is acknowledged back to the client, it is guaranteed that it is persisted on disk and replicated. So, your write is never lost in any kind of failures. To achieve this FoundationDB/Document Layer need to do more work than MongoDB. This makes Document Layer slower than standard MongoDB.

Concurrency

As both MongoDB and FoundationDB are distributed they benefit quite a bit by having more concurrency in your application. As @SteavedHams suggested in his response you could have more parallelism in your test either by adding more threads or even better making the test async (with rate limiter). With the test you have above you are pretty much running one request at a time on Document Layer, which runs one transaction at a time on FDB. So, getting limited by the latency of each transaction. But, you are measuring the throughput, how long it takes me to insert 1M documents. Making your test concurrent will probably help both MongoDB and Document Layer. But more for Document Layer, considering it will help hide the latency.

Transactions

Document Layer works more like traditional SQL databases when it comes to transaction management - Everything is a transaction. Either you start a transaction from the client and do your requests part of that transaction, or Document Layer starts a transaction for each and every request. Every transaction has some initiation cost of getting the current database version, getReadVersion(), and commit at the end, which persists the changes made in the transaction. As each transaction has this overhead, just like SQL databases, the best way to hide this overhead is to do more in a transaction.

Implicit transactions: You don’t manage transactions on the client side, Document Layer creating transactions for each MongoDB request. Do more in each request, this is my first suggestion in the previous post.
Explicit transactions: Manage transactions yourself using beginTransaction and commitTranaction and include more business logic there. These are custom commands only work on Document Layer. Sadly, explicit transactions in Document Layer have some limitations regarding connections. We are rewriting them to be compatible with MongoDB v4.0 transactions.

If you compare with Document Layer requests or transactions with MongoDB v4.0 transactions, that’s where Document Layer shines a lot. As transactions are primitive operations in FoundationDB.

Project maturity

Having said all that, there are quite a few improvements which will help to bring down the latency even further down. For example, Document Layer manages metadata for every collection with a version. And every request reads that version to make sure collection metadata hasn’t changed. There are ways we could cut this read from the critical path, reducing latency. This is one example, there are other improvements which will help the performance. The project is now in initial stages. For us, the initial goal was to have a strongly consistent database. We will have performance improvements coming in, to make these numbers better.

Finally…

To finish this long post, it is fair to say Document Layer is never going to do better if you are measuring the latency of a single write. Even though you are measuring here 1M writes, as they are happening sequentially you are limited by the latency of a single write. Document Layer strengths are better consistency guarantees and has faster transactions, thanks to FoundationDB. As its running on FoundationDB, it inherits point in time backup/restores, fearless DR. To get better performance from Document Layer we will have to follow the same guidelines as FoundationDB applications.

ryanworl · February 20, 2019, 6:35pm

@mbhaskar is the online schema change / schema caching project public? I’ve familiarized myself pretty thoroughly with the F1 online schema change system and would love to see how you’re doing it

vaggarwal · February 20, 2019, 6:49pm

Thanks for detail explanation , At this point I am not working on cluster , It is just a single machine with no redundancy

-Vishal

alexmiller · February 20, 2019, 9:26pm

At this precise moment, no. I’ve been harassing @Evan to post his proposal on this to the forums, so hopefully that’ll happen as soon as he stops debugging failed simulation tests.

SteavedHams · February 20, 2019, 11:51pm

How many fdbserver processes are you running on the machine?

vaggarwal · February 25, 2019, 7:02pm

Just one fdbserver

-Vishal

vaggarwal · February 25, 2019, 7:03pm

Increasing fdbserver will help increase the performance

-Vishal

aaerofeev · June 13, 2019, 1:08pm

Thank you for your work, this is awesome!
How can I use “Explicit Transactions” on the mongoAPI side?

I’m trying to find these commands from the readme level document, but did not find anything = (

How soon will you develop mongodb 4.0 sessions & transactions in the document layer?

Topic		Replies	Views
Document Layer Performance Document Layer performance	1	1526	April 8, 2019
How to nicely handle Mongo drivers that automatically round-robin? Document Layer	1	1661	November 29, 2018
Need to use mongo api driver against document layer Document Layer	1	906	March 8, 2019
Announcing FoundationDB Document Layer Announcements	1	900	November 29, 2018
Document layer unexpectedly slow with naive test Document Layer performance	2	847	October 4, 2020

1 M rows write via Mongo client using document layer

Write concern

Concurrency

Transactions

Project maturity

Finally…

Related topics