This paper presents a multi-thread web crawler using a Sentence Bidirectional Encoder Representations from Transformers (S-BERT). The S-BERT is used to calculate the similarity between the predefined classes and the text of the downloaded web pages.
- Yahya Tawil.
- Dr. Saed ALQARALEH.
Available on Youtube.
This Repo is organized like the following:
-
Code: inside this folder there are
downloader_V_0_10.ipynb
which is the crawler code written and tested using Google Colab and alsoevaluator.ipynb
which is a simple code to evaluate the results (inform of csv file) of crawler (calculate the true positives and false positives for each topic) -
Report: A report in both PDF and Latex source.
-
Results: CSV files as an examples of an output after running the code for hours.
To run the code, use downloader_V_0_10.ipynb and run it directly in Google colab. Make sure to select the right settings by changing the conf
dictonary in the code.
@INPROCEEDINGS{9599076,
author={Tawil, Yahya and Alqaraleh, Saed},
booktitle={2021 Innovations in Intelligent Systems and Applications Conference (ASYU)},
title={BERT Based Topic-Specific Crawler},
year={2021},
volume={},
number={},
pages={1-5},
doi={10.1109/ASYU52992.2021.9599076}}