A large dataset of Sri Lankan addresses
This repository contains Python code for web scraping using the requests
and BeautifulSoup
libraries. The code scrapes data from the "Rainbow Pages" website and saves it to a file named data.txt
.
Before running the code, make sure you have Python installed on your system. You can download Python from the official website: https://www.python.org/downloads/
- Clone this repository to your local machine using the following command:
git clone https://github.com/TharinduMadhusanka/sri-lanka-addresses-web-scraping.git
-
Change the directory to the project folder: cd your-repo-name
-
Install the required Python packages by running: pip install requests beautifulsoup4
To run the web scraping script, use the following command:
python main.py
The script will start scraping data from the "Rainbow Pages" website and save it to the data.txt
file in the project folder. Please note that web scraping might be subject to website terms of service, so make sure to respect the website's policies and avoid making too many requests in a short time to avoid being blocked.
Here is my blog post. Check it out. 😊
- The code in this repository is for educational purposes and may require additional modifications for other use cases.
- Special thanks to the contributors of the
requests
andBeautifulSoup
libraries for providing powerful tools for web scraping in Python.