FoundationDB

DPDK networking interface


(Ryan Worl) #1

Given the well-factored and testable nature of the FoundationDB networking interface, has anyone explored an implementation of the interface using the DPDK?

Removing the context switches from packet processing may benefit latency and throughout due to a lack of copying and locks. The single threaded model already works well if you assign one thread to each queue on the NIC. No locks or context switches needed.

ScyllaDB, for example, uses this architecture of one thread per NIC queue and sharding the database based on the NIC queues within each machine in the cluster.


(Alex Miller) #2

I’ve looked into it briefly, and it would be nice. Our commit flow involves a decent number of network hops, so lowering the networking latency with a better networking stack would make a good impact on our minimum possible commit latency.

However, the awkward part is that both the FDB and DPDK want to control the main run loop. FDB needs to control it to be able to do deterministic scheduling, but DPDK normally wishes to control it instead. So it’d be a non-trivial, but cool change.


(Ryan Worl) #3

That is a complication I hadn’t considered. :confused:

If there is a way to reconcile that in the future, the Seastar framework from ScyllaDB already has an Apache licensed implementation of all the DPDK code they needed for their database.

For reference: https://github.com/scylladb/seastar


(David Scherer) #4

I haven’t studied DPDK, but there’s no obvious reason you couldn’t write an INetwork implementation that would hand over control of the run loop to DPDK or Seastar or whatever, although a “reactor” interface would be a little easier. The amount of code to be written is probably on the order of what’s in Net2.actor.cpp now. Be wary, though: this code won’t be under test in simulation!

I think the biggest potential performance improvement might actually be read throughput on large clusters, where batching of multiple reads to the same storage server can’t work very well and so reads tend to be limited by the performance of the network stack.