FoundationDB

Bandwidth consumption from one fdb client to fdbserver process


#1

I’m trying to reason about the bandwidth consumption of a single fdb client and the fdbserver processes under different replication factors. Here is what my experiment looks like.

I have one single fdbclient and 4 fdbserver processes on 4 machines (i.e. one process per machine) with redundancy mode ‘double’. Using the go-ycsb benchmark I can do 1K updates per second with the value size of 10K (key size is less than 100 bytes). I understand that the actual bandwidth would be larger than 10MB/sec but the observed outgoing bandwidth on this client machine is about ~100MB/sec according to iftop. I repeated the experiment multiple times and I made sure that there was no other processes consuming high bandwidth during the experiment.

Now I’m trying to reason about this 10x relationship between the actual data being updated (10K * 1K/sec == 10MB/sec) and the bandwidth consumed.

As I read the write path diagram, the client needs to send the mutation to the proxy (which would send to tLog servers) and it talks to the storage server directly. Since the replication factor is 2 in my case, I’d have 2 + 1 == 3 times amplification. Also, the encoding of the key-value pairs has an overhead of 2x, therefore the above number would become 6 now but it still doesn’t match the observed 10x number.

(I also repeated the experiment of 1KB value and I can still observe ~10x bandwidth amplification).

Any pointers to how to explain the relationship between the actual data being requested vs the actual consumed bandwidth would be appreciated.


(A.J. Beamon) #2

Could you provide some more details about the specifics of what your test is doing? For example, is it simply doing 1K writes/s with ~10KB/write, or is it also doing some reads? If it is doing reads, are there many conflicts? You can tell that via status in fdbcli, which has a field for conflict rate. It might also be interesting to know the difference between the reported ‘transactions started’ and ‘transaction committed’. And just to be clear, you are only measuring the bandwidth of the client and not any server process, correct?

The client talks to the proxy at transaction start time (to get the read version), but I don’t think the size of this communication is significant.

For reads, the client talks directly to the storage servers. In some cases, though, it may have to ask the proxies which storage server to talk to if its cache of data locations is incomplete or wrong.

For commits, the client should be sending a message to only one proxy under normal circumstances. The proxy is then the one that fans it out to multiple transaction logs. The client doesn’t talk to the storage nodes at all during a commit.

The commit message contains all of the mutations being committed as well as the conflict ranges for the transaction. Unless you disabled them, there will be a read and/or write conflict range for each key (or range) read and/or written. Conflict ranges include both the start and end of the range, and for single key reads or writes, this means you more or less submit the key twice (but no values). Thus, the total size of the commit request that did only single key reads and writes should be roughly (size of mutations) + 2*(size of keys read) + 2*(size of keys written), plus some overhead. There’s definitely some room to optimize the amount of space used for single-key ranges, but given the relative sizes of your keys and values it shouldn’t be a huge penalty.

There’s various other communication that happens. I think the client maintains connections with everybody it’s talked to, and there are some heartbeats that go back and forth. There’s also some communication with the cluster controller, but I don’t think any of that should be significant. There may be other things I’m forgetting, too.

If the details of your workload don’t suggest an explanation, we may have to dig a little deeper to track down what’s going on.


#3

Thanks a lot for the detailed explanation, see my setup following. According to your comment and my key value size, for each commit the bandwidth should be dominated by the mutation size which should be ~10KB per request (coz the key is less than 100) but I’m definitely see way higher bandwidth than 10MB / sec under a 1K QPS and yes I’m measuring on the machine where the benchmark ran not the fdb process.

Benchmark https://github.com/pingcap/go-ycsb/

The setup is as mentioned above, 4 fdb processes (built using SHA1 ca720e15403bbe05a28a02b67d54b84bed9e09e3) on 4 machines respectively. These are 32 core machines and have plenty of RAM. No CPU affinity has been set. The redundancy mode was set as ‘double’.

For ‘update’, the benchmark reads the requested key, modifies it and sets it as in the codes here..

For the 10KB value benchmark, here is the property file like the following
recordcount=1000000
operationcount=10000000
workload=core
readallfields=true
readproportion=0.0
updateproportion=1.0
scanproportion=0.0
insertproportion=0.0
requestdistribution=uniform
fieldcount=1
fieldlength=10000
maxscanlength=10

The constructed key is less than 100 bytes and the constructed values are x number of bytes, which is 10K here.

The ‘key’ is generated by hashing a random number between 0 and the recordcount (it’s 1 million here).

Load the data by

./go-ycsb load fdb -P update-only-load -p fdb.cluster=fdb.cluster --threads 200

Run it by

./go-ycsb run fdb -P update-only-load -p fdb.cluster=fdb.cluster --threads 200

The bandwidth monitored was the outbound bandwidth from the machine where the benchmark is run to the fdb processes.

I repeated the experiment; the numbers were similar but not exactly same as the original post.
For 10KB value size, go-ycsb reported a qps as 1.4K/sec (updated every 10 sec, so it’s an avg/smoothed QPS).
The outbound bandwidth fluctuates between 100MB ~ 130MB. Among the 4 fdb processes, the benchmark transmitted large amount of data to two of them; almost always half of the total outbound bandwidth.

fdbcli status reported QPS fluctuates around 1.4K, e.g. sometimes 1K. No conflict was observed. A typical output looks like

Read rate - 1476 Hz
Write rate - 1441 Hz
Transactions started - 1417 Hz
Transactions committed - 1441 Hz
Conflict rate - 0 Hz

I also repeated it with1KB value set up (by changing the fieldlength from 10000 to 1000), the result was also similar.

go-ycsb reported QPS fluctuated between 5K/sec and 7K/sec. Outbound bandwidth fluctuates between 70MB/sec to 90MB/sec. Constantly 1 conflict per second like the following.
Workload:
Read rate - 9390 Hz
Write rate - 9536 Hz
Transactions started - 9534 Hz
Transactions committed - 9533 Hz
Conflict rate - 1 Hz

fdb sees higher QPS due to transaction retry (I assume) but is it normal for 1 conflict per second to cause extra 1K QPS ?


(A.J. Beamon) #4

I ran a similar test and wasn’t able to reproduce the issue, but I did notice that iftop is reporting results in megabits rather than megabytes. Is there any chance that’s the source of confusion?

It doesn’t look like there are many retries happening from your sampled status output, as the transactions started and committed rates are very similar. Also, each conflict should result in only one retry. I’m not sure why there’s a disparity between what the client and cluster report here.


#5

Ah, crap. I’m sorry. Usually I have alias iftop=‘iftop -B’ but on these machines I don’t have it and I assumed it. That’s where the misunderstanding was. Thanks a lot for investigation.