Skip to content

3ujohn/Spark-Log-Parser

 
 

Repository files navigation

Spark Scripts

The code in this repository is licensed under the Apache License, version 2. It depends Python and a POSIX shell. Further, process_logs.sh depends on DagSim.

process_logs.sh extracts from experimental data the information about Spark jobs, their stages and tasks.

summarize.sh performs a summarization of performance parameters relative to Spark runs (more precisely relative to stages of such jobs).

How to use the scripts

process_logs.sh [-p|-s|-h] directory

With process_logs.sh you can process experimental data obtained via Spark Experiment Runner. If run with the flag -p, it will only extract the compressed archives and elaborate the logs. On the other hand, using -s you can run a batch of simulations with DagSim, starting from the already processed data. If you do not use any flag, the script will do both steps.

summarize.sh [-h] [-u number] directory

Execute summarize.sh and pass the path to the root directory. If you are considering multi-user experiments, you can pass the -u option to provide this information. Take into account that you should apply process_logs.sh -p to directory beforehand.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 80.8%
  • Shell 18.3%
  • Lua 0.9%