Bookmarks tagged [web-content-extracting]
https://github.com/Alir3z4/html2text
Convert HTML to Markdown-formatted text.
https://github.com/michaelhelmick/lassie
Web Content Retrieval for Humans.
https://github.com/coleifer/micawber
A small library for extracting rich content from URLs.
https://github.com/codelucas/newspaper
News extraction, article extraction and content curation in Python.
https://github.com/buriy/python-readability
Fast Python port of arc90's readability tool.
https://github.com/kennethreitz/requests-html
Pythonic HTML Parsing for Humans.
https://github.com/miso-belica/sumy
A module for automatic summarization of text documents and HTML pages.
https://github.com/deanmalmgren/textract
Extract text from any document, Word, PowerPoint, PDFs, etc.
https://github.com/gaojiuli/toapi
Every web site provides APIs.