Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
I recently bumped into JNI's usage of modified UTF-8, which was causing the parser / expander to error and hang when I passed in any address which contained \0, \u0000 or any 4 byte UTF-8 character.
I did a bit of reading up on the way to work around the issue (this blog was quite instructive: http://banachowski.com/deprogramming/2012/02/working-around-jni-utf-8-strings/), and had a crack at making a fix to avoid the usage of
NewStringUTF
andGetStringUTFChars
and instead passingjbyteArray
into and out of the JNI code.I've added some new tests which were failing before the change, and are now passing for me. This was the first time going anywhere near C code for me so would be really appreciative if someone could check I'm not doing anything really stupid or dangerous e.g. not releasing memory (I can see this project has been dormant for a while so will understand if that's not possible!).
This should fix
#36
And also hopefully is the more general solution you referenced in this other pull request:
#22
Noteworthy things:
If you pass in a \0 or \u0000 character in the middle of your address string the rest of the string will be truncated (I assume this is because C uses NUL terminated char arrays so it just stops at a NUL char). Removing these from the middle of an address felt like something that should be done as an upfront step by the user rather than in jpostal
There are still usages of
GetStringUTFChars
andNewStringUTF
remaining, but they are for strings which should shouldn't contain problematic characters, e.g. languages and other parser options. I've only changed how the address input string is handled