Skip to content

Daraz is an ecommerce platform. This scraper will follow pagination to collect all the products in a category.

License

Notifications You must be signed in to change notification settings

seemab-yamin/daraz-product-scraper

Repository files navigation

Daraz Product Scraper

Daraz product scraper is a powerful Python scraper, built with Scrapy and Playwright, conquers Daraz's dynamic content and anti-bot measures. It extracts product details (titles, prices, etc.) while respecting robots.txt and handles complex product structures. It even cleans and converts prices to USD!

Goal:

  • Handle Dynamic Content rendering in a headless mode.
  • Handle anti-bot services to bypass blocking
  • Respects robots.txt guidelines
  • Extracting Complex Product Listing Structure and Extract Dynamic Prices
  • Implemented Data Validation like cleaning prices data and converting into Dollars($)
  • Navigates to next page until reaches last page and handle pagination
  • Implemented Download Delay to avoid server overloading.

OS Platform:

Python 3 (Linux/Mac/WSL)

Installing Dependencies:

To install dependencies open terminal and type:

git clone https://github.com/seemab-yamin/daraz-product-scraper
cd daraz-product-scraper
pip install -r requirements.txt
playwright install --with-deps chromium

Run:

python3 scraper.py {category_url}

# Example
python3 scraper.py https://www.daraz.pk/vented-dryers

Pros:

  • Store Details logs for later debugging
  • Can increase CONCURRENT_REQUESTS to initiate more instances to speed up things but will consume more resources.

Cons:

  • Browser based solutions are Memory expensive so should be used in specific use cases like: Bypass blocking, Load Dynamic Content, Perform Automation, etc

Improvements:

  • Need to add explicit time sleep or scroll as Images weren't fully rendered in the page source.

Dataset Screenshot:

Dataset Image

Video Demo:

Demo Video

Resources:

About

Daraz is an ecommerce platform. This scraper will follow pagination to collect all the products in a category.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages