Trace files (*.xml) parsing

Is there a tool to parse and present trace (*.xml) files generated by fdbserver?


Splunk can handle these trace files. Either free version or commercial version of Splunk can be obtained.

If you use Python, I have some starter code that can convert a trace XML file into a Pandas dataframe:

import xml.dom.minidom
import pandas as pd

def filter_dict(input_dict, kept_keys, sep='\n'):
    # only keep keys in the "kept_keys", combine all others to "Details"
    # with a string concatenated with "sep"
    other = ''
    d = {}
    for key, value in input_dict.items():
        if key in kept_keys:
            d[key] = value
            if 'Details' in d:
                d['Details'] += sep + key + ": " + value
                d['Details'] = key + ": " + value
    return d

def load_trace_file(filename, columns):
    dom = xml.dom.minidom.parse(filename)
    events = dom.getElementsByTagName('Event')
    data = [filter_dict(dict(e.attributes.items()), columns) for e in events]
    return pd.DataFrame(data)

A usage example:

trace_file = "trace."
columns = ['As', 'ID', 'Locality', 'Machine', 'Severity', 'Transition', 'Time', 'Type']

df = load_trace_file(trace_file, columns)
# Convert to number type for Severity & Time columns
df = df.astype({'Severity': int, 'Time': float})

Get a list of unique machines and roles:

def get_roles(df):
    """df is a dataframe obtained from load_trace_file
    return df['As'].dropna().unique().tolist()

def get_machines(df):
    return df['Machine'].dropna().unique().tolist()

The only tool that I’ve seen to process trace files is wavefrontHQ/wavefront-fdb-tailer.

Other folks have hooked up InfluxDB+Grafana to be able to do arbitrary graphs, or have an odd collection of bespoke python scripts (like Jingyu’s) laying around.

Does the network option to emit JSON instead of XML work with fdbserver? I know it works with the clients, and that may be easier for some people to parse.

Yeah, but it’s a command line parameter in that case.

EDIT: looks like it’s --trace_format, with xml and json being the supported choices.

Is there a tool that

  1. merges multiple trace files from several fdbservers into the single text file in the chronological order
  2. addes ip:port labels to the single file
  3. converts Times from numbers to a human-readable date and time

