Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix modified utf-8 issues #38

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

wboult
Copy link

@wboult wboult commented Jul 26, 2020

I recently bumped into JNI's usage of modified UTF-8, which was causing the parser / expander to error and hang when I passed in any address which contained \0, \u0000 or any 4 byte UTF-8 character.

I did a bit of reading up on the way to work around the issue (this blog was quite instructive: http://banachowski.com/deprogramming/2012/02/working-around-jni-utf-8-strings/), and had a crack at making a fix to avoid the usage of NewStringUTF and GetStringUTFChars and instead passing jbyteArray into and out of the JNI code.

I've added some new tests which were failing before the change, and are now passing for me. This was the first time going anywhere near C code for me so would be really appreciative if someone could check I'm not doing anything really stupid or dangerous e.g. not releasing memory (I can see this project has been dormant for a while so will understand if that's not possible!).

This should fix
#36

And also hopefully is the more general solution you referenced in this other pull request:
#22

Noteworthy things:

  1. If you pass in a \0 or \u0000 character in the middle of your address string the rest of the string will be truncated (I assume this is because C uses NUL terminated char arrays so it just stops at a NUL char). Removing these from the middle of an address felt like something that should be done as an upfront step by the user rather than in jpostal

  2. There are still usages of GetStringUTFChars and NewStringUTF remaining, but they are for strings which should shouldn't contain problematic characters, e.g. languages and other parser options. I've only changed how the address input string is handled

@wboult wboult force-pushed the fix-modified-utf8-issues branch from 49d6ae5 to 0528780 Compare July 26, 2020 15:27
@wboult
Copy link
Author

wboult commented Jul 26, 2020

@albarrentine build passes on JDK 8 but not on JDK 7 due to an issue with protocol being used to download gradle wrapper, I tried upgrading to the latest version of gradle but it didn't help

@wboult wboult force-pushed the fix-modified-utf8-issues branch 2 times, most recently from cda52fd to 238bed7 Compare August 3, 2020 08:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant