Fix modified utf-8 issues #38

wboult · 2020-07-26T14:46:54Z

I recently bumped into JNI's usage of modified UTF-8, which was causing the parser / expander to error and hang when I passed in any address which contained \0, \u0000 or any 4 byte UTF-8 character.

I did a bit of reading up on the way to work around the issue (this blog was quite instructive: http://banachowski.com/deprogramming/2012/02/working-around-jni-utf-8-strings/), and had a crack at making a fix to avoid the usage of NewStringUTF and GetStringUTFChars and instead passing jbyteArray into and out of the JNI code.

I've added some new tests which were failing before the change, and are now passing for me. This was the first time going anywhere near C code for me so would be really appreciative if someone could check I'm not doing anything really stupid or dangerous e.g. not releasing memory (I can see this project has been dormant for a while so will understand if that's not possible!).

This should fix
#36

And also hopefully is the more general solution you referenced in this other pull request:
#22

Noteworthy things:

If you pass in a \0 or \u0000 character in the middle of your address string the rest of the string will be truncated (I assume this is because C uses NUL terminated char arrays so it just stops at a NUL char). Removing these from the middle of an address felt like something that should be done as an upfront step by the user rather than in jpostal
There are still usages of GetStringUTFChars and NewStringUTF remaining, but they are for strings which should shouldn't contain problematic characters, e.g. languages and other parser options. I've only changed how the address input string is handled

wboult · 2020-07-26T15:32:38Z

@albarrentine build passes on JDK 8 but not on JDK 7 due to an issue with protocol being used to download gradle wrapper, I tried upgrading to the latest version of gradle but it didn't help

wboult added 2 commits July 26, 2020 15:20

Fixing modified utf8 issue

0944704

Adding tests for NUL and 4 byte issue

0528780

wboult force-pushed the fix-modified-utf8-issues branch from 49d6ae5 to 0528780 Compare July 26, 2020 15:27

Trying to fix TLS issue

238bed7

wboult force-pushed the fix-modified-utf8-issues branch 2 times, most recently from cda52fd to 238bed7 Compare August 3, 2020 08:05

macarran mentioned this pull request Dec 3, 2024

fix utf8 issues in jpostal OvertureMaps/jpostal#1

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix modified utf-8 issues #38

Fix modified utf-8 issues #38

wboult commented Jul 26, 2020 •

edited

Loading

wboult commented Jul 26, 2020

Fix modified utf-8 issues #38

Are you sure you want to change the base?

Fix modified utf-8 issues #38

Conversation

wboult commented Jul 26, 2020 • edited Loading

wboult commented Jul 26, 2020

wboult commented Jul 26, 2020 •

edited

Loading