Skip to content

Latest commit

 

History

History
22 lines (17 loc) · 1.01 KB

readme.md

File metadata and controls

22 lines (17 loc) · 1.01 KB

OLX Phone Loader

A web scraper to get data from OLX ads.

How use

  1. Clone or download this repository.
  2. Install the dependencies with pip install -r requirements.txt.
  3. Install Tesseract, instructions here.
  4. Set the value of variable ocr.pytesseract.tesseract_cmd on converter.py.
  5. Run python app.py on your prefered terminal.

How it works

  1. Made request to url using urllib.request to get the list of ads.
  2. Parsed html reponse using BeautifulSoup.
  3. Made a new request for each ad.
  4. Search for phone in response. The phone is a GIF file. :(
  5. Save the gif file on images folder.
  6. Converts the gif to png and save it.
  7. Reads phone text from image using pytesseract.
  8. Lastly, save the data on csv file using the csv Python lib.