GitHub - weightwatchers-carlanderson/recipe-scrapers: Python package for scraping recipes data

A simple web scraping tool for recipe sites.

pip install recipe-scrapers

then:

from recipe_scrapers import scrape_me

# give the url as a string, it can be url from any site listed below
scraper = scrape_me('https://www.allrecipes.com/recipe/158968/spinach-and-feta-turkey-burgers/')

# Q: What if the recipe site I want to extract information from is not listed below?
# A: You can give it a try with the wild_mode option! If there is Schema/Recipe available it will work just fine.
scraper = scrape_me('https://www.feastingathome.com/tomato-risotto/', wild_mode=True)

scraper.title()
scraper.total_time()
scraper.yields()
scraper.ingredients()
scraper.instructions()
scraper.image()
scraper.host()
scraper.links()
scraper.nutrients()  # if available

Note: scraper.links() returns a list of dictionaries containing all of the <a> tag attributes. The attribute names are the dictionary keys.

Scrapers available for:

Contribute

Part of the reason I want this open sourced is because if a site makes a design change, the scraper for it should be modified.

If you spot a design change (or something else) that makes the scraper unable to work for a given site - please fire an issue asap.

If you are programmer PRs with fixes are warmly welcomed and acknowledged with a virtual beer.

If you want a scraper for a new site added

Open an Issue providing us the site name, as well as a recipe link from it.
You are a developer and want to code the scraper on your own:
- If Schema is available on the site - you can do this
  - How do I know if a schema is available on my site?
- Otherwise, scrape the HTML - like this
- Generating a new scraper class:
```
python generate.py <ClassName> <URL>
```
  - ClassName: The name of the new scraper class.
  - URL: The URL of an example recipe from the target site. The content will be stored in test_data to be used with the test class.

For Devs / Contribute

Assuming you have python3 installed, navigate to the directory where you want this project to live in and drop these lines

git clone [email protected]:hhursev/recipe-scrapers.git &&
cd recipe-scrapers &&
python3 -m venv .venv &&
source .venv/bin/activate &&
pip install -r requirements-dev.txt &&
pre-commit install &&
python -m coverage run -m unittest &&
python -m coverage report

In case you want to run a single unittest for a newly developed scraper

python -m coverage run -m unittest tests.test_myscraper

FAQ

How do I know if a website has a Recipe Schema? Run in python shell:

from recipe_scrapers import scrape_me
scraper = scrape_me('<url of a recipe from the site>', wild_mode=True)
# if no error is raised - there's schema available:
scraper.title()
scraper.instructions()  # etc.

Special thanks to:

All the contributors that helped improving the package. You are awesome!

Name		Name	Last commit message	Last commit date
Latest commit History 586 Commits
.github		.github
recipe_scrapers		recipe_scrapers
templates		templates
tests		tests
.coveragerc		.coveragerc
.flake8		.flake8
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.rst		README.rst
generate.py		generate.py
requirements-dev.txt		requirements-dev.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Scrapers available for:

Contribute

If you want a scraper for a new site added

For Devs / Contribute

FAQ

Special thanks to:

About

Releases

Packages

Languages

License

weightwatchers-carlanderson/recipe-scrapers

Folders and files

Latest commit

History

Repository files navigation

Scrapers available for:

Contribute

If you want a scraper for a new site added

For Devs / Contribute

FAQ

Special thanks to:

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages