This repository hosts implementation code for Master's Thesis. The work evaluates effectiveness of Obfuscation & Software Integrity Protection schemes against Machine Learning-based attacks.
The image below summarizes the results:
- /sip_ml_pipeline - Contains entire ML pipeline from data generation to rendering result charts
- /notebooks - Interactive notebooks for data examination
- /code2vec - Reference to external code embedding component
- /diagrams - Draw.io diagram xml file sources
python3 -m venv venv &&
source venv/bin/activate &&
pip install -r requirements.txt
The full training data, including features, splits and results is ~500GB. Raw Data only include source
programs without preprocessing or feature extraction. Results Data only contains result .json
files and
model weights.