GitHub

This is 1 of my previous projects that scraps website content with Python and headless Selenium ChromeDriver.

There are some difficulties developing the scripts for target sites, such as ajax pagination and image loading, random waiting time and redirection to Cloudflare page. With patient observation and proper refinements on specific functions, the problems could be solved effectively.

There are also cases that scraping is not quite feasible, such as when the captcha is needed, or there are limitation to the number of pages to view or the number of images to download for each registered user account.

The scraped website content would be stored in JSON format as well as image format. Please check the folder json for reference.

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
jinja2		jinja2
selenium3		selenium3
selenium4		selenium4
soup		soup
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Releases

Packages

Contributors 2

Languages

CarsonBytes/python_scraping

Folders and files

Latest commit

History

Repository files navigation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages