binarycode-classifier

binary code classifier with deeplearning models(resnet, char-cnn).

dataset : 1.6 million binary code of malwares crawled from virus total, 1 million binary code of whitelist(non malware) binary code

실험명	데이터셋, preprocessing	결과	설명
Resnet	PE(train-7:eval-3), thumbnail	85.84%
Resnet with whitelist	PE(7:3), thumbnail	63.02%
distilling nn(model comp.)	PE(7:3), thumbnail	83.60%	base model : resnet
distilling nn(model ensemble)	PE(7:3), thumbnail	69.50%	base model : resnet
MPOS	PE(7:3), 32*24000	89.87%
MPOS with simhash	PE(7:3) 32*24000	79.30%	16bit simhashing with 32 byte block
Compare label variance	PE(7:3)	83%	각 엔진별 label에 대한 xgboost 결과
apk classification with api call	apk call feature	96%
xgboost with whitelist	PE + whitelist 7:3	91.8%
MPOS with whitelist	PE + whitelist 7:3	88%

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
models		models
script		script
tracing		tracing
DBLoader.py		DBLoader.py
README.md		README.md
__init__.py		__init__.py
batch_loader.py		batch_loader.py
config.py		config.py
main.py		main.py

Provide feedback