I'm crawling reddit website, and i want to store them in a database(postgresql maybe). Also in this project i want to use technics related to database(IP Rotation, Real User Agent, Other Request Headers, Random Intervals In Between Requests and a Referrer).
P.S: docker is required
- clone the repo
git clone https://github.com/cs-fedy/reddit-crawler
- run
docker compose up -d
to start the db. - install virtualenv using pip:
sudo pip install virtualenv
- create a new virtualenv:
virtualenv venv
- activate the virtualenv:
source venv/bin/activate
- install requirements:
pip install requirements.txt
- run the script and enjoy:
python scraper.py
- selenium: Primarily it is for automating web applications for testing purposes, but is certainly not limited to just that. Boring web-based administration tasks can (and should) also be automated as well.
- BeautifulSoup: Beautiful Soup is a library that makes it easy to scrape information from web pages. It sits atop an HTML or XML parser, providing Pythonic idioms for iterating, searching, and modifying the parse tree.
- python-dotenv: Add .env support to your django/flask apps in development and deployments.
- psycopg2: psycopg2 - Python-PostgreSQL Database Adapter.
- tabulate: Pretty-print tabular data.
- requests: Python HTTP for Humans.
- Do not follow the same crawling pattern: Incorporate some random clicks on the page, mouse movements and random actions that will make a spider look like a human.
- Make requests through Proxies and rotate them as needed: Create a pool of IPs that you can use and use random ones for each request. Along with this, you have to spread a handful of requests across multiple IPs. How to send requests through a Proxy in Python 3 using Requests.
- Rotate User Agents and corresponding HTTP Request Headers between requests. How to fake and rotate User Agents using Python 3 .
- Use a headless browser like Pyppeteer, Selenium or Playwright
created at 🌙 with 💻 and ❤ by f0ody
- Fedi abdouli - reddit crawler - fedi abdouli
- my twitter account FediAbdouli
- my instagram account f0odyy