Question on parallel commit latencies

(Paul J Davis) #1

Hiya,

I’ve been working on writing some prototype code for a CouchDB layer have finally gotten far enough to start analyzing various requests at the transaction tracing level. I’ve come across a somewhat curious behavior that I’m having trouble understanding based on my reading of the commit protocols in FoundationDB.

The following gantt chart shows seven total transactions. The first transaction is a single doc write. The second transaction is a two document write. Then there are five write transactions spawned in parallel that each write five documents.

My question has to do with the way that the commits are ending at different times. With five document writes across five parallel transactions I’m seeing each commit ending about 1-2ms after the previous. Tweaking that test to write fifty documents (i.e., 250 total) across five transactions extends that smear to something along the lines of 5-10ms between commit finish times.

I’m not particularly concerned about the absolute values on timing here as this is just a trace against a local dev server. What I’m finding curious though is that according to the versionstamps that are written as part of the parallel transactions I am getting multiple transactions into a single batch. My understanding is that proxies will batch transactions which are then submitted to resolvers and eventually the transaction logs. Given that, I would have expected to see any transactions that were part of the same batch to have a response from the proxy at roughly the same time rather than what appears to have been some sort of serialization of each commit.

Granted it’s also conceivable that even though transactions move through the system in a batch the transaction logs end up making each transaction durable individually which could explain what I’m seeing. But based on my shallow understanding of the process it was surprising enough I figured I’d ask for help understanding whatever I’m missing.

Thanks!

(Jingyu Zhou) #2

I am not sure what is the “commit time” on your figure. Is it the timestamp the client side sees the commit outcome? If so, I might have an explanation to what happened – it is because the outcome from a batch is sent back in order:

But it seems these timestamps are milliseconds apart, which is quite a large interval. So my interpretation can be wrong.

(Paul J Davis) #3

Thanks for the quick reply, Jingyu!

The script that generates those graphs is here:

The widths of all those bars in the gantt chart are the Latency reported in the trace-*.xml files generated using the FDB_NETWORK_OPTION_TRACE_ENABLE and FDB_TR_OPTION_TRANSACTION_LOGGING_ENABLE options.

And I agree, I’d expect at least some variation on reported times but was surprised to see the multi-millisecond differences as well.

Thanks for the code link though! I’ll use that as another jumping point to start off learning more parts of the code base. I’ve not yet got a super clear picture of where all of the important bits in the code are so its been a bit of a slog grep’ing through the codebase trying to figure out where various things are.

(Alex Miller) #4

Your commit latency for a single document update is 25ms, which seems large for just a single write? What sort of cluster are you running this test against?

I’d be really suspicious that some priority issue is what’s causing this for you. Either many roles packed into one process causing each send() to involve switching to some other task, or local contention causing the transaction end time to be recorded artificially late. That said, I suppose we don’t have a test that measures the smear of reported transaction end times in batches, so it’s possible that there could be something actually weird here.

(Paul J Davis) #5

Thanks Alex,

This is just a local single fdbserver process on OS X. I was only looking at traces to double check that my code was doing what I intended when I noticed the odd behavior. I was just surprised because I was expecting batches to return similar-ish commit times for anything in a single batch so was just trying to figure out where my mental model was wrong there.

Also for those document writes, its 2 + 4 * $num_docs total writes per transaction. The writing on the chart is fairly tiny to read there. Though still low enough I wasn’t thinking that it was a factor.

I am aware that there were concerns about OS X performance though I couldn’t find anything in the forums as to why so I could totally believe this to just be a weird effect of the dev environment I’m using. I’m not at all concerned about the absolute values here, just that the commits were spread out even when they ended up in the same batch.

Thanks!

(A.J. Beamon) #6

All the transactions from a single batch (i.e. with the same commit version) should be responded to at roughly the same time by the proxy. Jingyu posted the loop that does this, and my expectation is that it wouldn’t take very long for this loop to send replies to everyone.

On the client side, the latency is logged as soon as the client starts processing the reply. As long as one reply is being processed, however, subsequent ones will not be. I believe the response to this request will at the very least propagate all the way back up to your language binding, and depending on how you’ve written your client it could also include some of your client code as well. If you aren’t yourself running anything on the network thread in reaction to the commit completing, though, I wouldn’t expect this to take a particularly long time.

Another possibility that Alex alluded to is that your workload could be starving the client for brief intervals in between when it processes one commit response and when it starts processing the next. Is there any other activity going on with your client? It might be worth checking the ProcessMetrics client trace event to see if MainThreadCPUSeconds is large relative to elapsed, if the NetworkMetrics trace event has large values for S2Pri1 (an indicator for whether the event loop is ever taking a long time to flush), or whether you have any SlowTask trace events of relevant durations, meaning a single task runs for a while without yielding. The SlowTask events are sampled based on their duration, so depending on how long your test is, the absence of them may not be strong evidence of anything, but the presence of them could be interesting.