Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement Custom UTF-8 Decoder #885
base: master
Are you sure you want to change the base?
Implement Custom UTF-8 Decoder #885
Changes from 7 commits
cfeb127
e083376
4a54532
775f1ce
18e6080
6d8e314
8515899
51525ae
1f5f3eb
3d1093a
a5e46ae
3105843
7481274
60b3ca6
272770d
770955c
a354b30
960f2c0
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add a function that's signature is analog to
extract_ascii_strings
, i.e. operate on buffers?this can then be wrapped by this or a similar function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please add tests for various encoded strings to show they're properly decoded
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do you use
ignore
here? wouldn't we want to handle the case that invalid UTF-8 data is encountered (and not extract a string there)?I assume that your algorithm works pretty well, since you've opened the PR, but I can't quite follow how it works. Would you please add some comments explaining the design, and definitely a few test cases that exercise each of the branch arms?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, the tests for each branch are in
tests/test_utf8_decoder.py
. Let me know if anything else is required.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what about the other cases? are there any?
either way, please add an
else
branch to handle them, either logging or doing an assertion.