forked from nmt4binaries/nmt4binaries.github.io
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathindex.html
115 lines (92 loc) · 4.32 KB
/
index.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
---
layout: default
width:
- col-xs-12
- col-md-11 col-md-offset-1
- col-lg-10
sections:
- title: Cite This Work
link: "#paper"
- title: Instruction Embedding
link: "#embed"
- title: Siamese Model
link: "#model"
---
<div class="container-fluid">
<!-- ================= EMACS SECTION ====================== -->
<div class="row sect">
<div id="paper" class="{{ page.width | join: ' ' }}">
<div class="page-header">
<h2>Cite This Work</h2>
</div> <!-- /.page-header -->
<p>Our paper is published by Network and Distributed Systems Security Symposium (NDSS) 2019. You are encouraged to cite the following paper if you use the provided resources for academic research.
</p>
<p><a href="https://www.ndss-symposium.org/ndss-paper/neural-machine-translation-inspired-binary-code-similarity-comparison-beyond-function-pairs/">Neural Machine Translation Inspired Binary Code Similarity Comparison beyond Function Pairs</a>.</p>
<blockquote>
@inproceedings{zuo2019neural,
<br>
title={Neural Machine Translation Inspired Binary Code Similarity Comparison beyond Function Pairs},
<br>
author={Zuo, Fei and Li, Xiaopeng and Young, Patrick and Luo,Lannan and Zeng,Qiang and Zhang, Zhexin},
<br>
booktitle={Proceedings of the 2019 Network and Distributed Systems Security Symposium (NDSS)},
<br>
year={2019}
}
</blockquote>
<small>* We previously submitted the paper to NDSS 2018 in August 2017 and S&P 2019 in May 2018,
and finally got accepted to NDSS 2019 after significant improvement.
However, the main NMT-inspired idea remains the same.
<a href="https://cse.sc.edu/~zeng1/papers/332-NDSS-2018.pdf">Here</a> is our NDSS 2018 submission page.</small>
</div><!-- /paper -->
</div><!-- /.sect -->
<div class="row sect">
<div id="embed" class="{{ page.width | join: ' ' }}">
<div class="page-header">
<h2>Instruction Embedding</h2>
</div> <!-- /.page-header -->
<h3>Prerequiste:</h3>
<p>Make sure you have installed all of following packages or libraries (including dependencies if necessary) in your computer:</p>
<ol>
<li>
<a href="https://radimrehurek.com/gensim/">genism</a>
</li>
<li>
<a href="http://www.nltk.org/">NLTK</a>
</li>
<li>
<a href="http://scikit-learn.org/stable/index.html">Scikit-learn</a>
</li>
<li>
Tensorflow
</li>
<li>
Keras (version ≤ 2.1.4)
</li>
</ol>
<h3>Download Embedding Files</h3>
<p>We have trained three instruction embeddings of which dimension is 50, 100, 150.
To further use our Siamese based tool for binaries similarity detection,
you should first download them from the <a href="https://www.dropbox.com/sh/om0uvkruwj86fuh/AABOkWA5VD7S_WCdQV5D7eq1a?dl=0">link</a>.
</p>
<p> Normally, we prefer all instructions can find its embedding in pre-trained .w2v files.
If not, any unknown word will be replaced with zero vector.
</p>
</div><!-- /embed -->
</div><!-- /.sect -->
<div class="row sect">
<div id="model" class="{{ page.width | join: ' ' }}">
<div class="page-header">
<h2>SIAMESE MODEL</h2>
</div>
<h3>Input Data</h3>
<p>Only well pre-processed data can be accepted by the Siamese neural network based binaries similary detector.
To display the usage of Siamese model, we provide some input samples (e.g. test_set_O2.csv) which you can download from the <a href="https://github.com/nmt4binaries/nmt4binaries.github.io/tree/master/download/">link</a>.
</p>
<h3>Pre-trained Model</h3>
<p>As an example, we provided the pre-trained model weights for a Siamese based binary similarity detector, please click the <a href="https://github.com/nmt4binaries/nmt4binaries.github.io/tree/master/download/">link</a> to download it.
In detail, each sub-network in such Siamese network is a double-layer LSTM with 100D vector (for each instruction embedding) as input. Please refer to the example script in Python to re-run the test case.
</p>
</div> <!-- /model -->
</div> <!-- /.sect -->
</div><!-- /.container-fluid -->