These programs are designed to categorize specific websites.
(Powered by Yandex.Translate: )
Download GoogleScraper:
pip3 install GoogleScraper
(GoogleScraper requires Python3.)
Open a command prompt and enter the following:
GoogleScraper -m http -s "bing" --keyword-file websites.txt > output.txt
(You will need to have a file called websites.txt with a list of websites that you want to search.)
Run that output.txt file:
python2.7 output.txt output_from_extract.txt
(The first entry in the command line after '' should be the name of the input file that you are processing, which is the output from the desired GoogleScraper run. The entry after that should be the name of the output file that you want to create. You will need to put the file in the same directory as
(Powered by Yandex.Translate: )
Open a command prompt and enter the following:
(This will find the titles of webpages in Common Crawl.) Open a command prompt and enter the following:
python2.7 output.txt
(The first entry in the command line after '' should be the name of the output file that you want to create.)
(Powered by Yandex.Translate: )
Download the prerequisites:
pip2 install BeautifulSoup4
pip2 install request
Create a textfile called websites.txt with all of the websites that you want to search for. If you want to create a textfile of websites with a different filename, update the file so that it references the correct filename. Then run the file
This will output results in the form: website address category.
Create a textfile called websites.txt with all of the websites that you want to search. Then run the file on that. You will need to install BeautifulSoup and GoogleScraper as described above for the Title Searcher program and Description Searcher Program. This program will output two lists, one list will be the list of sites that were identified to be aggregators and the other list will be the list of sites that were not identified to be aggregators.
Create a textfile called websites.txt with all of the websites that you want to search. Then run the file on that. This program wilAl output the twitter handles and facebook pages that were found for the websites that were searched. At this point, some minor editing of the output is required.