Trace files (*.xml) parsing

Is there a tool to parse and present trace (*.xml) files generated by fdbserver?

Thanks

Splunk can handle these trace files. Either free version or commercial version of Splunk can be obtained.

If you use Python, I have some starter code that can convert a trace XML file into a Pandas dataframe:

import xml.dom.minidom
import pandas as pd

def filter_dict(input_dict, kept_keys, sep='\n'):
    # only keep keys in the "kept_keys", combine all others to "Details"
    # with a string concatenated with "sep"
    other = ''
    d = {}
    for key, value in input_dict.items():
        if key in kept_keys:
            d[key] = value
        else:
            if 'Details' in d:
                d['Details'] += sep + key + ": " + value
            else:
                d['Details'] = key + ": " + value
    return d


def load_trace_file(filename, columns):
    dom = xml.dom.minidom.parse(filename)
    events = dom.getElementsByTagName('Event')
    data = [filter_dict(dict(e.attributes.items()), columns) for e in events]
    return pd.DataFrame(data)

A usage example:

trace_file = "trace.000.000.000.000.0.1549822912.WhkJCk.1.xml"
columns = ['As', 'ID', 'Locality', 'Machine', 'Severity', 'Transition', 'Time', 'Type']

df = load_trace_file(trace_file, columns)
# Convert to number type for Severity & Time columns
df = df.astype({'Severity': int, 'Time': float})
df.head(5)

Get a list of unique machines and roles:

def get_roles(df):
    """df is a dataframe obtained from load_trace_file
    """
    return df['As'].dropna().unique().tolist()

def get_machines(df):
    return df['Machine'].dropna().unique().tolist()
2 Likes

The only tool that I’ve seen to process trace files is wavefrontHQ/wavefront-fdb-tailer.

Other folks have hooked up InfluxDB+Grafana to be able to do arbitrary graphs, or have an odd collection of bespoke python scripts (like Jingyu’s) laying around.

1 Like

Does the network option to emit JSON instead of XML work with fdbserver? I know it works with the clients, and that may be easier for some people to parse.

Yeah, but it’s a command line parameter in that case.

EDIT: looks like it’s --trace_format, with xml and json being the supported choices.

Is there a tool that

  1. merges multiple trace files from several fdbservers into the single text file in the chronological order
  2. addes ip:port labels to the single file
  3. converts Times from numbers to a human-readable date and time
    ?

The third point is addressed by https://github.com/apple/foundationdb/pull/4087/, which adds the field “DateTime”.