Daraz Product Scraper

Daraz product scraper is a powerful Python scraper, built with Scrapy and Playwright, conquers Daraz's dynamic content and anti-bot measures. It extracts product details (titles, prices, etc.) while respecting robots.txt and handles complex product structures. It even cleans and converts prices to USD!

Goal:

Handle Dynamic Content rendering in a headless mode.
Handle anti-bot services to bypass blocking
Respects robots.txt guidelines
Extracting Complex Product Listing Structure and Extract Dynamic Prices
Implemented Data Validation like cleaning prices data and converting into Dollars($)
Navigates to next page until reaches last page and handle pagination
Implemented Download Delay to avoid server overloading.

OS Platform:

Python 3 (Linux/Mac/WSL)

Installing Dependencies:

To install dependencies open terminal and type:

git clone https://github.com/seemab-yamin/daraz-product-scraper
cd daraz-product-scraper
pip install -r requirements.txt
playwright install --with-deps chromium

Run:

python3 scraper.py {category_url}

# Example
python3 scraper.py https://www.daraz.pk/vented-dryers

Pros:

Store Details logs for later debugging
Can increase CONCURRENT_REQUESTS to initiate more instances to speed up things but will consume more resources.

Cons:

Browser based solutions are Memory expensive so should be used in specific use cases like: Bypass blocking, Load Dynamic Content, Perform Automation, etc

Improvements:

Need to add explicit time sleep or scroll as Images weren't fully rendered in the page source.

Dataset Screenshot:

Video Demo:

Resources:

Scrapy: https://scrapy.org/
Scrapy-Playwright: https://github.com/scrapy-plugins/scrapy-playwright
Scrapy-Playwright Tutorial: https://scrapeops.io/python-scrapy-playbook/scrapy-playwright/

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
daraz_products_dataset.PNG		daraz_products_dataset.PNG
item_pipeline.py		item_pipeline.py
requirements.txt		requirements.txt
scraper.py		scraper.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Daraz Product Scraper

Goal:

OS Platform:

Installing Dependencies:

Pros:

Cons:

Improvements:

Dataset Screenshot:

Video Demo:

Resources:

About

Releases

Packages

Languages

License

seemab-yamin/daraz-product-scraper

Folders and files

Latest commit

History

Repository files navigation

Daraz Product Scraper

Goal:

OS Platform:

Installing Dependencies:

Pros:

Cons:

Improvements:

Dataset Screenshot:

Video Demo:

Resources:

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages