- BUG FIX: preprocessing doesn't happen when a dom tree is supplied to jusText.
- INCOMPATIBLE CHANGE: Stop words are case insensitive.
- BUG FIX: Function
decode_html
now respects parametererrors
when falling todefault_encoding
#9.
- FEATURE: Added XPath selector to the paragrahs. XPath selector is also available in detailed output as
xpath
attribute of<p>
tag #5.
- FEATURE: Added pluggable DOM preprocessor.
- FEATURE: Added support for Python 3.2+.
- INCOMPATIBLE CHANGE: Paragraphs are instances of
justext.paragraph.Paragraph
. - INCOMPATIBLE CHANGE: Script 'justext' removed in favour of
command
python -m justext
. - FEATURE: It's possible to enter an URI as input document in CLI.
- FEATURE: It is possible to pass unicode string directly.
- FEATURE: Character counts used instead of word counts where possible in order to make the algorithm work well in the language independent mode (without a stoplist) for languages where counting words is not easy (Japanese, Chinese, Thai, etc).
- BUG FIX: More robust parsing of meta tags containing the information about used charset.
- BUG FIX: Corrected decoding of HTML entities € to Ÿ
- First public release.