Skip to content

Convert training graphs to shingle vectors and compute the best clustering

License

Notifications You must be signed in to change notification settings

sbustreamspot/sbustreamspot-train

Repository files navigation

StreamSpot Train

http://www3.cs.stonybrook.edu/~emanzoor/streamspot/

Requirements

  • Anaconda for the entire scikit-learn stack.
  • GCC 5.2+ to compile the C++11 code.

Training Procedure

The following steps assume this repository has been cloned and all dependencies installed.

Convert the training data from CDM13/Avro to StreamSpot

For detailed instructions, see the sbustreamspot-cdm README.

For the purpose of instruction, infoleak_small_units.CDM13.avro is assumed to be the training data.

  • Get the StreamSpot CDM translation code: git clone https://github.com/sbustreamspot/sbustreamspot-cdm.git
  • Install its dependencies: pip install -r requirements.txt
  • Convert CDM13/Avro training data to StreamSpot edges: python translate_cdm_to_streamspot.py --url avro/infoleak_small_units.CDM13.avro --format avro --source file --concise > streamspot/infoleak_small_units.CDM13.ss

Convert the StreamSpot training graphs to shingle vectors

The graph-to-shingle-vector transformation code is in C++ to ensure high performance. It is a modified version of the streamspot-core code.

Build and run the code as follows;

cd graphs-to-shingle-vectors
make optimized
./streamspot --edges=../streamspot/infoleak_small_units.CDM13.ss --chunk-length 24 > ../shingles/infoleak_small_units.CDM13.sv
cd ..

Cluster the training graph shingle vectors

Ensure the dependencies have been installed: pip install -r requirements.txt

python create_seed_clusters.py  --input shingles/infoleak_small_units.CDM13.sv > clusters/infoleak_small_units.CDM13.cl

The *.cl file can then be provided to streamspot-core.

Contact

About

Convert training graphs to shingle vectors and compute the best clustering

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published