diff --git a/content/post/series-building-a-scraper-that-scales/2023-10-25-the-motivation-to-build-a-scraper-in-python/index.md b/content/post/series-building-a-scraper-that-scales/2023-10-25-the-motivation-to-build-a-scraper-in-python/index.md index 13c32af..d4c6987 100644 --- a/content/post/series-building-a-scraper-that-scales/2023-10-25-the-motivation-to-build-a-scraper-in-python/index.md +++ b/content/post/series-building-a-scraper-that-scales/2023-10-25-the-motivation-to-build-a-scraper-in-python/index.md @@ -124,7 +124,8 @@ analysis. Switch to Beautiful Soup: I wanted to transition from [Selenium](https://pypi.org/project/selenium/) to [Beautiful Soup](https://pypi.org/project/beautifulsoup4/) for parsing raw -HTML. This change ought to significantly reduced the time needed to scrape data. +HTML. This change ought to significantly reduced the time needed to scrape data. + Parametrization of Postal Codes: Allowing postal codes as an input parameter to make the scraper more flexible. I was initially only looking into a few communes in Brussels that I was interested in. However, if I wanted to do some @@ -153,6 +154,7 @@ run. SQLite Database: I wanted to use a SQLite database with an initial schema to store the data I'd be accumulating over time. I've really enjoyed working with [`SQLAlchemy`](https://pypi.org/project/SQLAlchemy/) as the ORM mapper to interact with the database. + Data Validation with Pydantic: Before adding the scraped data to the database, I implemented validation checks using [Pydantic](https://pypi.org/project/pydantic/). This ensured that only @@ -164,7 +166,7 @@ complex analyses and experiments. ## Final comments -In the next blog post in the series, I will go over the implementation details. +In the next blog [post](/a-scraper-that-scales-part-ii/) in the series, I will go over the implementation details. For those interested, you can find the current state of the project [here](https://github.com/roumail/immoweb-scraper/tree/v1.0.0). diff --git a/content/post/series-building-a-scraper-that-scales/2023-10-26-stateful-applications-need-to-be-designed-differently/index.md b/content/post/series-building-a-scraper-that-scales/2023-10-26-stateful-applications-need-to-be-designed-differently/index.md new file mode 100644 index 0000000..6640abc --- /dev/null +++ b/content/post/series-building-a-scraper-that-scales/2023-10-26-stateful-applications-need-to-be-designed-differently/index.md @@ -0,0 +1,14 @@ +--- +title: "Stateful Applications Need to Be Designed Differently" +summary: "The improvements and what remains" +categories: ["technology"] +tags: ["python", "web-scraping"] +series: series-building-a-scraper-that-scales +authors: + - admin +url: "/a-scraper-that-scales-part-ii/" +date: 2023-10-26T16:15:28+02:00 +draft: false +--- + +Complete me \ No newline at end of file