Nabeel-ejaz / olx-phone-loader Public

forked from rodolfoghi/olx-phone-loader

Notifications You must be signed in to change notification settings
Fork 0
Star 0

A web scraper to get data from OLX ads.

0 stars 5 forks Branches Tags Activity

Notifications

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
csv		csv
images		images
.gitignore		.gitignore
app.py		app.py
converter.py		converter.py
file_helper.py		file_helper.py
readme.md		readme.md
requirements.txt		requirements.txt
run.bat		run.bat
util.py		util.py

Repository files navigation

OLX Phone Loader

A web scraper to get data from OLX ads.

How use

Clone or download this repository.
Install the dependencies with pip install -r requirements.txt.
Install Tesseract, instructions here.
Set the value of variable ocr.pytesseract.tesseract_cmd on converter.py.
Run python app.py on your prefered terminal.

How it works

Made request to url using urllib.request to get the list of ads.
Parsed html reponse using BeautifulSoup.
Made a new request for each ad.
Search for phone in response. The phone is a GIF file. :(
Save the gif file on images folder.
Converts the gif to png and save it.
Reads phone text from image using pytesseract.
Lastly, save the data on csv file using the csv Python lib.

About

A web scraper to get data from OLX ads.

Report repository

Releases

No releases published

Packages

No packages published

Languages

Python 97.8%
Batchfile 2.2%