-
-
Notifications
You must be signed in to change notification settings - Fork 50
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
* Add method to determine if specific encoding is multi byte * Add has_submatch property on a match * Disallow calling __eq__ on different object type * Add percent_chaos and percent_coherence Better readability on chaos and coherence instead of ratio between 0. and 1. * Coherence ratio based on mean instead of sum of best results * Using loguru for trace/debug <3 * best() method rewrited * from_byte method improved - new parameters - debug available with loguru - probe chaos improved for hiragana and katakana * Experimental, hook on UnicodeDecodeError Provide encoding detection on decoding error. * bump 1.2.0 add loguru dep+ * Add test CLI normalize file without replacing it
- Loading branch information
Showing
10 changed files
with
227 additions
and
62 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,15 @@ | ||
from charset_normalizer.constant import MULTI_BYTE_DECODER | ||
import importlib | ||
|
||
|
||
def is_multi_byte_encoding(encoding_name): | ||
""" | ||
Verify is a specific encoding is a multi byte one based on it IANA name | ||
:param str encoding_name: IANA encoding name | ||
:return: True if multi byte | ||
:rtype: bool | ||
""" | ||
return issubclass( | ||
importlib.import_module('encodings.{encoding_name}'.format(encoding_name=encoding_name)).IncrementalDecoder, | ||
MULTI_BYTE_DECODER | ||
) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,14 @@ | ||
import sys | ||
from charset_normalizer.legacy import detect | ||
|
||
|
||
def charset_normalizer_hook(exctype, value, traceback): | ||
if exctype == UnicodeDecodeError: | ||
cp_detection = detect(value.object) | ||
if cp_detection['encoding'] is not None: | ||
value.reason = value.reason+'; you may want to consider {} codec for this sequence.'.format(cp_detection['encoding']) | ||
|
||
sys.__excepthook__(exctype, value, traceback) | ||
|
||
|
||
sys.excepthook = charset_normalizer_hook |
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -13,13 +13,14 @@ | |
EMAIL = '[email protected]' | ||
AUTHOR = 'Ahmed TAHRI @Ousret' | ||
REQUIRES_PYTHON = '>=3.5.0' | ||
VERSION = '1.1.1' | ||
VERSION = '1.2.0' | ||
|
||
REQUIRED = [ | ||
'cached_property', | ||
'dragonmapper', | ||
'zhon', | ||
'prettytable' | ||
'prettytable', | ||
'loguru' | ||
] | ||
|
||
EXTRAS = { | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters