Skip to content

All of the work from my research at the University of Cincinnati

License

Notifications You must be signed in to change notification settings

santacml/Malware-as-Video

Repository files navigation

Malware-as-Video

Hello!

This repo is all of my code from my research work at the University of Cincinnati. If you are interested, I would recommend reading my thesis or papers before trying to read this code, which are all on my website at santacml.github.io.

My research was funded through the Air Force Research Laboratory, which means the work was very cool, but also very not-for-public-until-approved. Because of this, my research is spread out across a few private repositories. This repo is an amalgamation of all of that work, hence the one giant init commit.

This code is not clean and/or pretty, as I also did not think anyone besides myself would use it.

I plan to attempt to clean it up a bit over the next few months, but as these things go, I doubt I will get to all of it.

Root Folder

This folder contains the main scripts. "train.py" is used for training networks, and "test.py" is used for running tests on them. "testSalience.py" is for the saliency testing (NAECON 2019) which was a bit too big to throw into the testing script, and "trainKFold.py" is the same idea.

I attempted some modularity in "model_builder.py", which can be slightly daunting at first, but makes sense. Essentially all the networks I built have the same first 2 and last 2 layers, and the middle part was all that changed.

Quick note for pruning - I implemented custom functionality in the library "Keras-Surgeon" on GitHub. I'm trying to submit a pull request for it, but it seems dead. I have forked the repo in my personal repos. If you'd like to use node-distance pruning, you can find the code in my forked repo.

Libraries Folder

The hardest part about using this code will be getting the data working. All the code is there. The malicious code comes from the Microsoft Malware Classification Challenge, but the benign code comes from an in-house dataset we generated from a default Windows install. If you intend to use this code, I recommend starting from scratch on the dataset end, as the input format is discussed in-depth in my thesis.

Both because of the file sizes and because they are classified, I cannot upload the .pklz files I use throughout my work. Good luck.

Besides the data, this folder contains various classes and helpful functions I needed over the years.

Leflow Folder

This folder contains necessary materials for deploying the network to an FPGA, as detailed in my thesis, using the LeFlow libarary on GitHub.

Networks Folder

Contains all the final networks from my thesis.

About

All of the work from my research at the University of Cincinnati

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published