-
Notifications
You must be signed in to change notification settings - Fork 75
Readability
Erik Rose edited this page Mar 22, 2016
·
2 revisions
[Safari](Safari used the Readability code (Apache 2 licensed). http://www.theregister.co.uk/2010/06/08/safari_reader_based_on_open_source_project/ ) and FF both use it.
Apache-2 licensed
This is roughly accurate as of the Arc90 Labs work. It needs to be updated to reflect the changes Mozilla has made since then.
- Rip out some unlikely things by id and class, like "comment", "disqus", "menu", etc. (except if they're on the body tag).
- Turn divs that don't contain any block elements into p tags.
- Score using…
- Length of paragraphs
- Number of commas (?!)
- Scale scores by link density.
- Prepend and append sibling nodes of winner if…
- Their scores are ≥1/5 of the winner
- They're at least 80 chars long and have low link density or
- They're short but have no links and have at least one thing that looks like a sentence.
For our purposes, we would need…
- Some output even if we aren't confident that it's the main content. Err on the side of too much.
- No automatic traversal to additional pages of paginated content