Trace files (*.xml) parsing

(Carles Olle) #1

Is there a tool to parse and present trace (*.xml) files generated by fdbserver?


(Jingyu Zhou) #2

Splunk can handle these trace files. Either free version or commercial version of Splunk can be obtained.

If you use Python, I have some starter code that can convert a trace XML file into a Pandas dataframe:

import xml.dom.minidom
import pandas as pd

def filter_dict(input_dict, kept_keys, sep='\n'):
    # only keep keys in the "kept_keys", combine all others to "Details"
    # with a string concatenated with "sep"
    other = ''
    d = {}
    for key, value in input_dict.items():
        if key in kept_keys:
            d[key] = value
            if 'Details' in d:
                d['Details'] += sep + key + ": " + value
                d['Details'] = key + ": " + value
    return d

def load_trace_file(filename, columns):
    dom = xml.dom.minidom.parse(filename)
    events = dom.getElementsByTagName('Event')
    data = [filter_dict(dict(e.attributes.items()), columns) for e in events]
    return pd.DataFrame(data)

A usage example:

trace_file = "trace."
columns = ['As', 'ID', 'Locality', 'Machine', 'Severity', 'Transition', 'Time', 'Type']

df = load_trace_file(trace_file, columns)
# Convert to number type for Severity & Time columns
df = df.astype({'Severity': int, 'Time': float})

Get a list of unique machines and roles:

def get_roles(df):
    """df is a dataframe obtained from load_trace_file
    return df['As'].dropna().unique().tolist()

def get_machines(df):
    return df['Machine'].dropna().unique().tolist()
(Alex Miller) #3

The only tool that I’ve seen to process trace files is wavefrontHQ/wavefront-fdb-tailer.

Other folks have hooked up InfluxDB+Grafana to be able to do arbitrary graphs, or have an odd collection of bespoke python scripts (like Jingyu’s) laying around.

1 Like
(Alec Grieser) #4

Does the network option to emit JSON instead of XML work with fdbserver? I know it works with the clients, and that may be easier for some people to parse.

(A.J. Beamon) #5

Yeah, but it’s a command line parameter in that case.

EDIT: looks like it’s --trace_format, with xml and json being the supported choices.