1 M rows write via Mongo client using document layer


(VISHAL AGGARWAL) #1

1 M rows write via Mongo client using document layer is very slow comparatively loading it in mongodb database . I got 26 minutes versus 7 minutes

Here is the document-

for(var i = 1; i <= 1000000 ; i++) {
      db.mycollections.insert({
          "_id" : i,
          "item" : "canvas" + i, 
          "qty" : 100 + i,
          "tags" : ["cotton"],
          "title" : "How do I create manual workload i.e., Bulk inserts to Collection ",
          " Iteration no:" : i
      });
}

Any changes is needed for this to run fast


(Bhaskar Muppana) #2

Document Layer creates a separate FoundationDB transaction for each and every request and then commits. This gives strong consistency guarantees for the data. At the same time, each transaction has some overhead, like getting read-version and transaction commit. On top of it, every request needs to get metadata version to make sure metadata hasn’t changed.

One easy way to get better performance would be using bulk inserts. You can use insert_many() and insert multiple documents in each request. That should help with the performance.


(Steve Atherton) #3

An alternative to inserting many documents per transaction would be to do many single-document insert transactions in parallel, which will hide the latency of starting and committing transactions. You wouldn’t want infinite parallelism, but a pattern where, for example, you dispatch N inserts initially and then dispatch a new insert each time a dispatched insert completes should work well. I’m not sure what the ideal value of N is, at least hundreds and possibly low thousands.


(VISHAL AGGARWAL) #4

Thanks for the quick response , Will give it a shot


(VISHAL AGGARWAL) #5

I tried could able to improve write performance but not comparable to loading it in mongodb.


(Bhaskar Muppana) #6

It depends on a lot of factors. Some related to your deployment configuration, some related to design traits of Document Layer and in general FoundationDB. I will try to explain top bullets

Write concern

Write concern you are running with will have a huge impact on MongoDB’s performance and consistency guarantees.

  • MongoDB: If you don’t specify any, defaults would be {w: 1, j: false}. That config only guarantees in-memory writes on standalone or primary. For stronger consistency, you would need something like {w: majority, j: true}. That would make sure the majority of replicas have your write, persistent on disk. It is still eventual consistency. And in case of failures, your writes could be rollbacked. This is how writes work on standard MongoDB.

  • Document Layer: Document Layer, and in general FoundationDB, goal is to provide strong consistency we are used to in SQL world. Document Layer just ignores the write concern provided by the client. Irrespective of the write concern, writes are always done at full consistency. Once the write is acknowledged back to the client, it is guaranteed that it is persisted on disk and replicated. So, your write is never lost in any kind of failures. To achieve this FoundationDB/Document Layer need to do more work than MongoDB. This makes Document Layer slower than standard MongoDB.

Concurrency

As both MongoDB and FoundationDB are distributed they benefit quite a bit by having more concurrency in your application. As @SteavedHams suggested in his response you could have more parallelism in your test either by adding more threads or even better making the test async (with rate limiter). With the test you have above you are pretty much running one request at a time on Document Layer, which runs one transaction at a time on FDB. So, getting limited by the latency of each transaction. But, you are measuring the throughput, how long it takes me to insert 1M documents. Making your test concurrent will probably help both MongoDB and Document Layer. But more for Document Layer, considering it will help hide the latency.

Transactions

Document Layer works more like traditional SQL databases when it comes to transaction management - Everything is a transaction. Either you start a transaction from the client and do your requests part of that transaction, or Document Layer starts a transaction for each and every request. Every transaction has some initiation cost of getting the current database version, getReadVersion(), and commit at the end, which persists the changes made in the transaction. As each transaction has this overhead, just like SQL databases, the best way to hide this overhead is to do more in a transaction.

  • Implicit transactions: You don’t manage transactions on the client side, Document Layer creating transactions for each MongoDB request. Do more in each request, this is my first suggestion in the previous post.
  • Explicit transactions: Manage transactions yourself using beginTransaction and commitTranaction and include more business logic there. These are custom commands only work on Document Layer. Sadly, explicit transactions in Document Layer have some limitations regarding connections. We are rewriting them to be compatible with MongoDB v4.0 transactions.

If you compare with Document Layer requests or transactions with MongoDB v4.0 transactions, that’s where Document Layer shines a lot. As transactions are primitive operations in FoundationDB.

Project maturity

Having said all that, there are quite a few improvements which will help to bring down the latency even further down. For example, Document Layer manages metadata for every collection with a version. And every request reads that version to make sure collection metadata hasn’t changed. There are ways we could cut this read from the critical path, reducing latency. This is one example, there are other improvements which will help the performance. The project is now in initial stages. For us, the initial goal was to have a strongly consistent database. We will have performance improvements coming in, to make these numbers better.

Finally…

To finish this long post, it is fair to say Document Layer is never going to do better if you are measuring the latency of a single write. Even though you are measuring here 1M writes, as they are happening sequentially you are limited by the latency of a single write. Document Layer strengths are better consistency guarantees and has faster transactions, thanks to FoundationDB. As its running on FoundationDB, it inherits point in time backup/restores, fearless DR. To get better performance from Document Layer we will have to follow the same guidelines as FoundationDB applications.


(Ryan Worl) #7

@mbhaskar is the online schema change / schema caching project public? I’ve familiarized myself pretty thoroughly with the F1 online schema change system and would love to see how you’re doing it


(VISHAL AGGARWAL) #8

Thanks for detail explanation , At this point I am not working on cluster , It is just a single machine with no redundancy

-Vishal


(Alex Miller) #9

At this precise moment, no. I’ve been harassing @Evan to post his proposal on this to the forums, so hopefully that’ll happen as soon as he stops debugging failed simulation tests. :wink:


(Steve Atherton) #10

How many fdbserver processes are you running on the machine?


(VISHAL AGGARWAL) #11

Just one fdbserver

-Vishal


(VISHAL AGGARWAL) #12

Increasing fdbserver will help increase the performance

-Vishal