FoundationDB

Logging of transactions for debugging purposes


#1

I am developing a layer using the FDB C API and would like to know how I can trace the state of transactions that my client generates. Ideally, it would be possible to see what keys were affected by the transaction, too.
I’m having a transaction appear to take no effect, despite being committed and having no errors returned, and would like to debug further.

The trace log files contain transaction metrics, but no detailed breakdown of key-value store state mutations.


(Christophe Chevalier) #2

Unfortunately, there are no built-in feature to log the state of transactions client-side, and there is some logging capability of transactions server-side (used by the backup tool) but it will not produce something that I would consider usable if your intent is to develop, troubleshoot and optimize an application or a layer.

Out of necessity, I added this capability to the .NET binding, in order to measure each operations and to produce timelines of each transactions with timings, key/value results, error codes, retries, and so on. The default output is in text format that can be dumped to text files, but the raw timings could also be collected and sent in any other format (JSON, binary, …).

You can see some example in this comment (and others below): Performance characteristics of using Watches for Distributed Task Scheduling

I don’t think that other bindings have the same capability at the current time. And since you are using the C binding which is the lowest client API (most other bindings do use the C binding under the hood), your only choice is to create some sort of wrapper on top of FDBTransaction* that does this kind of logging for you.

My experience of the last 5 years, working with fdb from the application/layer side, is that this kind of tool is invaluable, and I don’t know how I would have been able to do without!.

I would recommend everyone wanting to do serious work with layers, to invest some time into this kind of tooling, because it will help you a lot in the future!


#3

That’s a shame, Christophe. Thank you for your response. It would be nice to have logging of transactions server-side, as a definitive oracle, rather than relying on individual client logging.

I’ll make a note of your .NET bindings, however. Could be of use in future.


(A.J. Beamon) #4

It doesn’t seem to be documented in all of our bindings, but there exists a transaction option that can log some details about the specific actions of selected transactions. You can see the option in our Java documentation here:

https://apple.github.io/foundationdb/javadoc/com/apple/foundationdb/TransactionOptions.html#setTransactionLoggingEnable-java.lang.String-

For the C bindings, you could look in fdb_c_options.g.h for this option.

If you enable this option for a transaction (as well as the network option to turn on trace logging), then you should see a record of the reads and mutations for that transaction in your trace log output. Events for this tracing will all have the prefix TransactionTrace_ and will include the ID that you pass to the option. There will be different events for whether the commit succeeded or returned an error, so you can use that to determine what should have been successfully written or what errors are preventing the commits.

EDIT: I had intended to also link to the 5.2 version of this source file, in which you can see the different events that will be logged for transaction tracing.


(Christophe Chevalier) #5

Good to see that there is at least a way to trace most of the events in Java! Was this introduced after the 3.x versions?

A few remarks:

  • You have to extract the relevant data from the log file generated by the binding, which makes it difficult to include the transaction logs as part of the final test output result.
  • Events include the latency which seems to be measured from inside the binding (I’m guessing), but do not include the time spent waiting in a dispatch queue or in a ThreadPool inside the application layer itself (which mean that timings are a bit optimistic)
  • Most of the events I’ve seen do not include the actual result returned by the db, but only the arguments. Makes it a bit more difficult to understand why the rest of the transaction did or did not do what was expected
  • I don’t see any event for other types of operations: snapshot reads, set, clear, atomic, versionstamped variants, watches, options configured, …
  • No clear way to see which operations were executed concurrently from operations that ran sequentially, within a transaction? Also, a transaction (as seen by the application) does not start at the first read.
  • You cannot capture or attach any details about the thread or callstack that performed a specific action (“who called this GetRange on that transaction?”).
  • Keys and arguments are dumped as raw bytes, instead of tuples. For people that can see the red dress in the matrix this may not be a big deal, but if you don’t know the tuple encoding by heart, this can become non-trivial :slight_smile:
  • No integration with the Directory Layer to simplify keys and make them more readable.

I think that some of these points require the logger to be at least in the same context as the application itself, so written in Go/Java/xyz in order to capture state relevant to the runtime. Most of the timings and events can also be captured at this level, but there are some data that is inaccessible, such as the reason why a transaction conflicted (not easy to do because the client itself does not know), or which particular storage nodes were accessed by which read operation (if you want to optimize down to that level).

This would need a deeper integration between the binding and the C client that does not exist currently.


(A.J. Beamon) #6

This option is present in all bindings, but apparently it isn’t documented for all bindings. It was introduced sometime after 3.x, though I don’t recall exactly when.

Yeah, these timings are not intended to represent the completion of any application level requests. Their purpose is more to keep tabs on what the latencies in the FDB part of the system are.

Mutations are all included as part of the commit event. I believe snapshot reads will be captured in the read events, though to determine what your conflict ranges are you’ll have to check the commit event.