index.html

---
layout: default
width:
  - col-xs-12
  - col-md-11 col-md-offset-1
  - col-lg-10
sections:

  - title: Cite This Work
    link: "#paper"

  - title: Instruction Embedding
    link: "#embed"

  - title: Siamese Model
    link: "#model"

---

<div class="container-fluid">

  <!-- ================= EMACS SECTION ====================== -->
   <div class="row sect">
    <div id="paper" class="{{ page.width | join: ' ' }}">
      <div class="page-header">
        <h2>Cite This Work</h2>
      </div> <!-- /.page-header -->

      <p>Our paper is published by Network and Distributed Systems Security Symposium (NDSS) 2019. You are encouraged to cite the following paper if you use the provided resources for academic research.
      </p>
      <p><a href="https://www.ndss-symposium.org/ndss-paper/neural-machine-translation-inspired-binary-code-similarity-comparison-beyond-function-pairs/">Neural Machine Translation Inspired Binary Code Similarity Comparison beyond Function Pairs</a>.</p>

      
      <blockquote>
      @inproceedings{zuo2019neural,
        <br>
      title={Neural Machine Translation Inspired Binary Code Similarity Comparison beyond Function Pairs},
        <br>
      author={Zuo, Fei and Li, Xiaopeng and Young, Patrick and Luo,Lannan and Zeng,Qiang and Zhang, Zhexin},
        <br>
      booktitle={Proceedings of the 2019 Network and Distributed Systems Security Symposium (NDSS)},
        <br>
      year={2019}
      }
      </blockquote>

      <small>* We previously submitted the paper to NDSS 2018 in August 2017 and S&P 2019 in May 2018, 
      and finally got accepted to NDSS 2019 after significant improvement. 
      However, the main NMT-inspired idea remains the same. 
      <a href="https://cse.sc.edu/~zeng1/papers/332-NDSS-2018.pdf">Here</a> is our NDSS 2018 submission page.</small>
      
    </div><!-- /paper -->
  </div><!-- /.sect -->

  <div class="row sect">
    <div id="embed" class="{{ page.width | join: ' ' }}">
      <div class="page-header">
        <h2>Instruction Embedding</h2>
      </div> <!-- /.page-header -->

      
      <h3>Prerequiste:</h3>
      <p>Make sure you have installed all of following packages or libraries (including dependencies if necessary) in your computer:</p>
      <ol>
        <li>
          <a href="https://radimrehurek.com/gensim/">genism</a>
        </li>
        <li>
          <a href="http://www.nltk.org/">NLTK</a>
        </li>
        <li>
          <a href="http://scikit-learn.org/stable/index.html">Scikit-learn</a>
        </li>
        <li>
          Tensorflow
        </li>
        <li>
          Keras (version ≤ 2.1.4)
        </li>
      </ol>
      
      <h3>Download Embedding Files</h3>
      <p>We have trained three instruction embeddings of which dimension is 50, 100, 150. 
        To further use our Siamese based tool for binaries similarity detection, 
        you should first download them from the <a href="https://www.dropbox.com/sh/om0uvkruwj86fuh/AABOkWA5VD7S_WCdQV5D7eq1a?dl=0">link</a>.
      </p>
      <p> Normally, we prefer all instructions can find its embedding in pre-trained .w2v files. 
        If not, any unknown word will be replaced with zero vector.
      </p>


    </div><!-- /embed -->
  </div><!-- /.sect -->

  <div class="row sect">
    <div id="model" class="{{ page.width | join: ' ' }}">
      <div class="page-header">
        <h2>SIAMESE MODEL</h2>
      </div>
      
        <h3>Input Data</h3>
        <p>Only well pre-processed data can be accepted by the Siamese neural network based binaries similary detector. 
          To display the usage of Siamese model, we provide some input samples (e.g. test_set_O2.csv) which you can download from the <a href="https://github.com/nmt4binaries/nmt4binaries.github.io/tree/master/download/">link</a>.
        </p>
      
      <h3>Pre-trained Model</h3>
      <p>As an example, we provided the pre-trained model weights for a Siamese based binary similarity detector, please click the <a href="https://github.com/nmt4binaries/nmt4binaries.github.io/tree/master/download/">link</a> to download it.
        In detail, each sub-network in such Siamese network is a double-layer LSTM with 100D vector (for each instruction embedding) as input. Please refer to the example script in Python to re-run the test case.
      
      </p>


    </div>   <!-- /model -->
  </div> <!-- /.sect -->
</div><!-- /.container-fluid -->