reddit crawler:

I'm crawling reddit website, and i want to store them in a database(postgresql maybe). Also in this project i want to use technics related to database(IP Rotation, Real User Agent, Other Request Headers, Random Intervals In Between Requests and a Referrer).

P.S: docker is required

installation:

clone the repo git clone https://github.com/cs-fedy/reddit-crawler
run docker compose up -d to start the db.
install virtualenv using pip: sudo pip install virtualenv
create a new virtualenv: virtualenv venv
activate the virtualenv: source venv/bin/activate
install requirements: pip install requirements.txt
run the script and enjoy: python scraper.py

used tools:

selenium: Primarily it is for automating web applications for testing purposes, but is certainly not limited to just that. Boring web-based administration tasks can (and should) also be automated as well.
BeautifulSoup: Beautiful Soup is a library that makes it easy to scrape information from web pages. It sits atop an HTML or XML parser, providing Pythonic idioms for iterating, searching, and modifying the parse tree.
python-dotenv: Add .env support to your django/flask apps in development and deployments.
psycopg2: psycopg2 - Python-PostgreSQL Database Adapter.
tabulate: Pretty-print tabular data.
requests: Python HTTP for Humans.

Scraping tips:

Do not follow the same crawling pattern: Incorporate some random clicks on the page, mouse movements and random actions that will make a spider look like a human.
Make requests through Proxies and rotate them as needed: Create a pool of IPs that you can use and use random ones for each request. Along with this, you have to spread a handful of requests across multiple IPs. How to send requests through a Proxy in Python 3 using Requests.
Rotate User Agents and corresponding HTTP Request Headers between requests. How to fake and rotate User Agents using Python 3 .
Use a headless browser like Pyppeteer, Selenium or Playwright

Author:

created at 🌙 with 💻 and ❤ by f0ody

Fedi abdouli - reddit crawler - fedi abdouli
my twitter account FediAbdouli
my instagram account f0odyy

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
docker-compose.yaml		docker-compose.yaml
requirements.txt		requirements.txt
scraper.py		scraper.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

reddit crawler:

installation:

used tools:

Scraping tips:

Author:

About

Releases

Packages

Languages

cs-fedy/reddit-crawler

Folders and files

Latest commit

History

Repository files navigation

reddit crawler:

installation:

used tools:

Scraping tips:

Author:

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages