Skip to content

Latest commit

 

History

History
42 lines (27 loc) · 1.62 KB

README.md

File metadata and controls

42 lines (27 loc) · 1.62 KB

BERT Based Topic-Specific Crawler

This paper presents a multi-thread web crawler using a Sentence Bidirectional Encoder Representations from Transformers (S-BERT). The S-BERT is used to calculate the similarity between the predefined classes and the text of the downloaded web pages.

image

Authors

ASYU 2021 Conferance Paper Presentation

Available on Youtube.

Organization

This Repo is organized like the following:

  • Code: inside this folder there are downloader_V_0_10.ipynb which is the crawler code written and tested using Google Colab and also evaluator.ipynb which is a simple code to evaluate the results (inform of csv file) of crawler (calculate the true positives and false positives for each topic)

  • Report: A report in both PDF and Latex source.

  • Results: CSV files as an examples of an output after running the code for hours.

Instructions

To run the code, use downloader_V_0_10.ipynb and run it directly in Google colab. Make sure to select the right settings by changing the conf dictonary in the code.

Cite This

@INPROCEEDINGS{9599076,
  author={Tawil, Yahya and Alqaraleh, Saed},
  booktitle={2021 Innovations in Intelligent Systems and Applications Conference (ASYU)}, 
  title={BERT Based Topic-Specific Crawler}, 
  year={2021},
  volume={},
  number={},
  pages={1-5},
  doi={10.1109/ASYU52992.2021.9599076}}