Bindings seems to be very slow for a simple operation

ex3ndr · August 5, 2020, 4:53am

I have done some tests of our task scheduler and we found that simple set operation could block for a considerable amount of time.

We have very simple code:

let buffer = Buffer.alloc(0);
await db.doTn(async (tx) => {
    for (let i = 0; i < 10000; i++) {
        let id = uuid(); // string
        tx.set(id, buffer);
    }
});

And it turns out that calling this set costs ~150ms.

Second attempt was is to write binary UUID - 16 bytes instead of 36 bytes and we got almost 30% improvement - ~100ms. Then i added disabling write conflicts in this transaction and everything was reduced another 30% down to 70ms. Much better but still slow.

There are almost nothing between this JS call and fdb_transaction_set, not even a memcpy.

It is surprising thing since i was expecting calling “set” almost free operation, is it true that it is better to offload code that works with bindings to a separate thread to avoid blocking of the main thread?

SteavedHams · August 5, 2020, 5:05am

        tx.set(id, buffer);

is a client-only operation that manipulates structures in memory and should be instantaneous.

await db.doTn(async (tx) => {
    for (let i = 0; i < 10000; i++) {
        let id = uuid(); // string
        tx.set(id, buffer);
    }
});

looks like it probably is not. It is likely executing a transaction, which involves talking to the cluster to get a read version, then submitting the transaction contents (sets, clears, conflict ranges) to the cluster to commit it and waiting for the result of the commit.

Which operation are you measuring?

ex3ndr · August 5, 2020, 5:17am

No, it is all local, but still slow. I am measuring inner code, not the whole transaction.

SteavedHams · August 5, 2020, 6:46am

So you are including this in your timing measurement as well?

        let id = uuid(); // string

And is this generating a random UUID? That is generally not a very fast operation. For example, see this python code:

import fdb, uuid, time, random
fdb.api_version(600)

count = 10000
db = fdb.open()

tr = db.create_transaction()
startTime = time.clock()
for x in xrange(0, count):
  tr.set(bytes(uuid.uuid4()), '')
print "set %d UUIDS: %dms" % (count, 1000 * (time.clock() - startTime))

tr = db.create_transaction()
startTime = time.clock()
for x in xrange(0, count):
  tr.set(bytes(int(random.random() * 1000000)), '')
print "set %d ints: %dms" % (count, 1000 * (time.clock() - startTime))

Typical output:

set 10000 UUIDS: 246ms
set 10000 ints: 61ms

SteavedHams · August 5, 2020, 6:54am

Ah, sorry I am tired, I forgot about the part where you simplified the UUID generation.

So my python script basically confirms your results. This appears to be the cost of 10,000 round-trips to the FDB network thread and back for the set() call.

ex3ndr · August 5, 2020, 6:58am

Plain UUID generation is under 20ms in my tests and i already adjusted my measurment.

Yes, also increasing number of data increase latency too.

SteavedHams · August 5, 2020, 7:02am

This speed represents about 160,000 set() operations/s, and I believe each one has to queue some operation which the FDB network thread executes. (Can someone confirm? @ajbeamon?)

That does not seem like an unreasonable speed, but it certainly is a bottleneck for many small sets.

Perhaps we should look into some sort of multi-set API to enable sending a group of KV pairs as a batch across the thread boundary.

ex3ndr · August 5, 2020, 7:07am

TBH i am hardly find that this is a good time, scheduling the network thread shouldn’t take that much. I think this is something more in it, it is probably simply blocks until network thread is answered, but we don’t really need to wait for network thread to answer with a void.

Multi-set is indeed could be useful feature.

ajbeamon · August 5, 2020, 3:33pm

Just to be clear, are you describing the total cost of all 10000 sets? In other words, each set would be 15 microseconds?

The call to set doesn’t block, it should just schedule a task on the main thread for future execution. As part of this, it also does copy the data you are setting, and it does briefly acquire a lock that would potentially compete with other simultaneous calls from your client (of which I assume there are none based on your example).

The fact that you see a change when disabling write conflicts is interesting, as I would expect that to only have an effect on the network thread. Is your transaction definitely running on a different thread than the network thread (i.e. you aren’t running it in a callback from the network thread)? If you are on a different thread, then there may be some other interaction effect at play.

ex3ndr · August 5, 2020, 9:44pm

Yes, this is nodejs and we are calling bindings within event loop of NodeJS if we had networking layer working on the same thread… that would be so much worse.

There are no parallel threads at all, just two of them - Main Thread and Network Thread. There are couldn’t be any concurrent requests and also IRC C bindings must be called from the same thread that created them, so concurrency are not possible.

Topic		Replies	Views
Performance of read-only transactions Using FoundationDB performance	6	1548	June 1, 2021
High client tail latency with Go bindings, increasing with CPU usage Using FoundationDB bindings , performance	2	620	April 6, 2022
A few design-pattern + check-my-understanding questions Using FoundationDB	9	2233	February 21, 2019
Optimizing a single large transaction ( 10,000 keys) Using FoundationDB performance	11	2312	February 24, 2020
What's the simplest solution to my use case? Using FoundationDB bindings	1	576	December 20, 2021

Bindings seems to be very slow for a simple operation

Related topics