Totally apart from queue design, you need to separate latency (“how fast can I do one operation on this data structure”) and throughput (“how many operations can (a huge number of clients) do on this data structure per second”)? FoundationDB’s latencies can be pretty low in the scheme of things - a tuned server setup with nice disks and network might be twice as fast as your laptop even with the overhead of communications between multiple machines - but they can never be good enough for a single client doing things one at a time to keep up with the throughputs it is capable of. To remove just 10,000 items per second from the queue one at a time you would have to have a whole-transaction latency of 100 microseconds, which is not possible with current hardware. And making the cluster bigger can’t help with this - nine women can’t have a baby in one month.
This is basically true of any distributed system, or of any system that offers durability (so that it has to sync to disk on every operation), and of course FoundationDB is both. For that matter in modern hardware even RAM has significantly higher throughput than 1/latency!
The usual use case for a “work queue” is that lots of processes/threads/actors/whatever are putting work on the queue and lots are taking work off the queue. It should be possible for, say, 100 parallel transactions to remove 100 different items from the queue in about the same time as 1 transaction can remove 1 item from the queue. If this isn’t true, it’s a problem with the design of your data model or transactions.
If the problem is that you really have a single process that wants to consume items from a queue sequentially, then the key is to not remove them one at a time! One process scanning through an ordered log with range reads (and erasing items with range clears) should be able to get much better performance, though it still won’t be able to keep a whole FoundationDB cluster busy.