A web scraper to get data from OLX ads.
- Clone or download this repository.
- Install the dependencies with
pip install -r requirements.txt
. - Install Tesseract, instructions here.
- Set the value of variable
ocr.pytesseract.tesseract_cmd
onconverter.py
. - Run
python app.py
on your prefered terminal.
- Made request to url using
urllib.request
to get the list of ads. - Parsed html reponse using BeautifulSoup.
- Made a new request for each ad.
- Search for phone in response. The phone is a GIF file. :(
- Save the gif file on images folder.
- Converts the gif to png and save it.
- Reads phone text from image using
pytesseract
. - Lastly, save the data on csv file using the
csv
Python lib.