Skip to content

Latest commit

 

History

History
49 lines (41 loc) · 1.42 KB

README.md

File metadata and controls

49 lines (41 loc) · 1.42 KB

Osta.ee web scraper

What is this?

This is a simple web scraper for collecting product information from osta.ee auction site. The scraper scrapes a specified category on osta.ee and writes found product information in a .json output file in the form of

{
    "title": "...",
    "price": "...",
    "img_href": "..."
}

where:

  • "title" - the title of the listed product
  • "price" - the current price of auction
  • "img_href" - a hyperlink to the product picture featured on the listing.

Setting things up

Make sure you have:

Using the scraper

To run the scraper, specify the category that you wish to scrape, e.g:

python scraper.py arvutid

Subcategories are also supported, e.g:

python scraper.py arvutid/monitorid

Specifying a second argument overrides the default output filename, eg:

python scraper.py arvutid/monitorid test.json

Things to keep in mind

  • Not all categories are have been tested, it is probable that some will not work.

  • Since the script runs syncronously, it may take several seconds or even minutes to complete on bigger categories.

  • User input validation is minimal, entering faulty arguments can break the script or cause unexpected behaviour.

Dependencies

  • Python >=3.8.2
  • BeautifulSoup4

Authors

  • Ranno Rajaste