This project contains of several components which build on top of one another. The following list is ordered from basic libraries with supporting functionality to full applications:
Crate | Description |
---|---|
logformat |
Definitions of core data types and serialization of traces (in Rust, Java). |
spark-parser |
Preprocesses Spark driver and executor logs and re-writes timings into our intermediate representation. |
pag-construction |
Constructs the Program Activity Graph (PAG) from a flat stream of events which denote the start/end of computation and communication. Also has scripts to generate various plots. |
snailtrail |
Calculates a ranking for PAG edges by computing how many times an edge appears in the set of all-pairs shortest paths. |
You will need the standard development tools (C compiler, version control) and we assume a Unix-like system.
-
Install Rust following the usual instructions. Make sure to install at least Rust 1.22 (stable) or newer due because we need support for custom derive (#35900) for serialization with Abomonation.
-
Compile the code (dependencies will be fetched automatically):
$ cargo build --release --all
-
Run the trace visualization tool which allows you to inspect the traces generated by Timely/Flink/Spark instrumentation before constructing the PAG. This translates our MessagePack log files into the Google Chrome trace visualizer format:
# Convert a sample raw log into our trace format $ cd spark-parser $ cargo run resources/app-20170324182509-0000 # Second translation step into Chrome's tracing format $ cd logformat/rust $ cargo run --bin chromeviz -- path/to/trace.msgpack
This will create a JSON file, which you can load in Google Chrome/Chromium by opening the URL chrome://tracing.
-
Compute CP summaries using the tools in
pag-construction
:$ MODE=run ./run_all_the_things.sh $ MODE=summary ./run_all_the_things.sh
-
Plot the CP summaries using the tools in
pag-construction
:$ MODE=summary ./run_all_the_things.sh
Note that
MODE=summary
is the default configuration, so it can be left out.
At the moment we support the following systems:
System | Notes |
---|---|
Flink | Requires custom pre-processing, to be released |
Spark | spark-parser is required to generate trace |
Timely | Requires custom pre-processing, to be published |
Tensorflow | Tooling included |
Heron | Outputs traces, to be published |
SnailTrail is primarily distributed under the terms of both the MIT license and the Apache License (Version 2.0), with portions covered by various BSD-like licenses.
See LICENSE-APACHE, and LICENSE-MIT for details.