Skip to content
bouvard edited this page Sep 13, 2010 · 5 revisions

To contribute a scraper to the project:

  1. Read the pages on Supported Languages and Dependencies.
  2. Fork the repository.
  3. Update the Sources page by marking the pages you will be working on with the label “In progress [Your name]”.
  4. Post to the Google Group to let everyone know that you have claimed a source.

Some tips for contributors:

  • All scrapers should derive from EventScraper.
  • A template with detailed notes can be found in the /scripts/pytemplates/example directory.
  • Only modify files in your scrapers directory. If you are creating utility modules or unit tests, put them there.
  • Keep your config file up-to-date and update your version number anytime the exact data being produced by your scraper may have changed.
  • Scrape everything: you can append as many fields as you need, so if data is available then ahead and scrape it. There is (almost) no such thing as scraping too much data.
  • If you will be joining page elements together to produce an informative ‘title’ or ‘description’ field, be sure to store all the elements in their own fields as well so that data can be easily mined/restructured later on.
Clone this wiki locally