Skip to content

Tianw22/Crawlers

Repository files navigation

Crawlers

Host Advice Crawler

Web Crawler of hostadvice.com => Download web driver of the brower and add to PATH first. => Open a terminal, input: python HostadviceCrawler.py => Hit ENTER => Input the country needs to be checked immediately. => Hit ENTER => Then the brower will open automatically. => Keep the brower open topmost. The .py will work for you, hands-free. => The .csv will be saved in the same folder of the .py

CDN list Crawler

Web Crawler of cdnlist.com => python CDNListCrawler.py => pip install git+https://github.com/abenassi/Google-Search-API => python CDNList-GoogleSearch.py

Canadian Chinese Importers

Web Crawler of Chinese importers in Canadian market. => python importers.py

ITW Companies

python nametosheet.py. => There are two kinds of outputs. Uncomment the one needed.

PDF Crawler

python PDFCrawler.py

Find Domain from Google

python FindDomain-GoogleSearch.py => Given company name, roughly find their domains.

Scratch Tables in PDF into Excel.

python TablefromPDF.py => Given PDF file, scratch the tables in the file.

Scratch Links from PDF into Excel.

python LinkinPDF.py => Given PDF file, scratch all the links in the file. If also given a excel with company names (lower, no space), can match the company names with the links.

Scratch attendees (Name, Company, Position) from website

python CloudExpo.py

Delegates of Tech Field Day

python TechFieldDay.py

AS info by rank

python ASRank.py

AS names and their neighbors of selected orgnizations.

ASRankSelected.py

About

Web Crawlers

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages