Throughput of a Queue/Set like data structure

biturbo · April 22, 2018, 5:16pm

Hey guys,

I’m new to FoundationDB and have only started playing with it. I’d like to use it to implement a queue / work distribution mechanism. I’m not too fussed about the order of the elements but would prefer it if the overhead was small.

I am running a single instance fdb server and a single client (with two threads on a dual core '17 Macbook Pro), but I’m only managing a throughput of about 120 removals per second. My transaction consists of finding an element and clearing it.

(I am using the queue example in the documentation, but I have changed it to make the get/clear transacted)

Is this normal throughput for a single server (no cluster)?

Thank you

Biturbo

dave · April 22, 2018, 5:51pm

No, that speed implies that you are latency bound. Either you are only removing items from the queue sequentially (in which case you are latency bound by definition) or your queue design makes your transactions all conflict (so that only one is happening at a time). At a guess you should be able to do some tens of thousands of operations per second on a single server if you get your workload properly parallel.

biturbo · April 22, 2018, 6:07pm

Thanks Dave

I’m using the example from the documentation, and the client and server are running on the same machine.

How would you recommend designing a high throughput queue where the elements are removed one by one? I’m ok with no ordering.

More specifically, how can I efficiently implement “get and remove any KV”?

Apologies if I’m thinking of this the wrong way.

Thanks

dave · April 22, 2018, 6:24pm

Totally apart from queue design, you need to separate latency (“how fast can I do one operation on this data structure”) and throughput (“how many operations can (a huge number of clients) do on this data structure per second”)? FoundationDB’s latencies can be pretty low in the scheme of things - a tuned server setup with nice disks and network might be twice as fast as your laptop even with the overhead of communications between multiple machines - but they can never be good enough for a single client doing things one at a time to keep up with the throughputs it is capable of. To remove just 10,000 items per second from the queue one at a time you would have to have a whole-transaction latency of 100 microseconds, which is not possible with current hardware. And making the cluster bigger can’t help with this - nine women can’t have a baby in one month.

This is basically true of any distributed system, or of any system that offers durability (so that it has to sync to disk on every operation), and of course FoundationDB is both. For that matter in modern hardware even RAM has significantly higher throughput than 1/latency!

The usual use case for a “work queue” is that lots of processes/threads/actors/whatever are putting work on the queue and lots are taking work off the queue. It should be possible for, say, 100 parallel transactions to remove 100 different items from the queue in about the same time as 1 transaction can remove 1 item from the queue. If this isn’t true, it’s a problem with the design of your data model or transactions.

If the problem is that you really have a single process that wants to consume items from a queue sequentially, then the key is to not remove them one at a time! One process scanning through an ordered log with range reads (and erasing items with range clears) should be able to get much better performance, though it still won’t be able to keep a whole FoundationDB cluster busy.

biturbo · April 22, 2018, 6:56pm

Thank you again for taking the time to reply.

I totally understand what you are saying, and of course throughput is bounded by latency. My question was more around the fact that I don’t know how to structure the transaction in order to allow sufficient concurrency, thus (by adding more parallel clients) increasing throughput.

More specifically, I believe the “primitive” I require is to “remove and return any element” (because I don’t care about order), and I am not sure how to go about doing that.

What I have currently is something that removes the first element in the range, which obviously results in transaction contention. What I am wondering is whether it is (even theoretically) possible to find a random key that exists without transaction contention, and if so any pointers as to how to achieve that in FoundationDB?

Edit: The ultimate goal is to prevent different workers from working on the same work items.

Many thanks

Evan · April 22, 2018, 7:31pm

Our backup and DR are implemented as clients of the database. We wrote a task bucket abstraction on top of the database to prevent different workers from doing the same work. You might be interested in looking at how we implemented that to give you some ideas for what you are planning.

github.com

apple/foundationdb/blob/master/fdbclient/TaskBucket.h

/*
 * TaskBucket.h
 *
 * This source file is part of the FoundationDB open source project
 *
 * Copyright 2013-2018 Apple Inc. and the FoundationDB project authors
 *
 * Licensed under the Apache License, Version 2.0 (the "License");
 * you may not use this file except in compliance with the License.
 * You may obtain a copy of the License at
 *
 *     http://www.apache.org/licenses/LICENSE-2.0
 *
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an "AS IS" BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License.
 */

This file has been truncated. show original

dave · April 22, 2018, 7:32pm

Take a look at foundationdb/layers/taskbucket/__init__.py. It does more or less the same thing I think you are trying to do. I don’t think it’s extraordinarily optimized, but it should get you past the barrier of being totally latency bound.

The basic principle is to insert items with random keys, and to pick a random item by picking a random key and looking for the “next” item. (Some extra care to manage conflict ranges prevents remove transactions from conflicting unnecessarily with transactions adding new items that “would have been” removed by the remove transaction)

biturbo · April 22, 2018, 7:42pm

Thank you both, I will look into that implementation.

Topic		Replies	Views
Transaction/operation throughput Using FoundationDB performance	10	1982	January 23, 2020
How to troubleshoot throughput performance degrade? Using FoundationDB performance	35	4342	June 20, 2019
How to achieve (1000 tx 1000 reads each)/sec on single client Using FoundationDB performance	24	3114	December 15, 2021
CometDB: A FoundationDB queue manager for Erlang Using FoundationDB bindings	4	930	June 29, 2023
Relax consistency guarantees Using FoundationDB	17	2250	October 30, 2019

Throughput of a Queue/Set like data structure

Related topics