Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ignoring locale settings and sun.jnu.encoding #344

Open
Schaka opened this issue Oct 14, 2024 · 10 comments
Open

Ignoring locale settings and sun.jnu.encoding #344

Schaka opened this issue Oct 14, 2024 · 10 comments

Comments

@Schaka
Copy link

Schaka commented Oct 14, 2024

I hope I'm in the right place and this isn't directly related to GraalVM. So please excuse me if I'm wasting yourn time.
You can find all the code I'm talking about right here: https://github.com/Schaka/janitorr/tree/bazarr-support

The image is built using the Spring-Boot bootImage step via Gradle and I'm passing these ENV variables.

"BPE_DEFAULT_LANG" to "en_US.UTF-8",
"BPE_LANG" to "en_US.UTF-8",
"BPE_LC_ALL" to "en_US.UTF-8",
"JAVA_TOOL_OPTIONS" to """
    -Dsun.jnu.encoding=UTF-8
    -Dfile.encoding=UTF-8
""".trimIndent(),
"BP_NATIVE_IMAGE_BUILD_ARGUMENTS" to """
    -march=compatibility
    -H:+AddAllCharsets
    -Dsun.jnu.encoding=UTF-8
    -Dfile.encoding=UTF-8
""".trimIndent()

My host (Debian 12) has LANG set correctly and LC_ALL not set at all.
According to the docs, I also passed these arguments to Docker via compose.yml

According to the docs, this would not print correctly to console (docker logs) otherwise, but definitely seems to. Granted, I use logback and not any direct prints, so there is a chance this fixes things magically.

services:
  janitorr:
    container_name: janitorr
    image: ghcr.io/schaka/janitorr:native-amd64-80-merge
    user: 1000:1000
    ports:
      - 8978:8978 # Technically, we don't publish any endpoints, so this isn't strictly required
    volumes:
      - /appdata/janitorr/config/application.yml:/workspace/application.yml
      - /share_media:/data
    environment:
      - LC_ALL=en_US.UTF-8
      - LANG=en_US.UTF-8

Yet, the second I use Path.of("a path with an ümläüt"), I run into the following exception:

java.nio.file.InvalidPathException: Malformed input or input contains unmappable characters: /data/media/anime-movies/Nausicaä of the Valley of the Wind (1984) [imdbid-tt0087544]/Nausicaä of the Valley of the Wind (1984) [imdbid-tt0087544] - [Bluray-1080p][FLAC 2.0][x264].mkv
	at java.base@23/sun.nio.fs.UnixPath.encode(UnixPath.java:131) ~[com.github.schaka.janitorr.JanitorrApplicationKt:na]
	at java.base@23/sun.nio.fs.UnixPath.<init>(UnixPath.java:77) ~[com.github.schaka.janitorr.JanitorrApplicationKt:na]
	at java.base@23/sun.nio.fs.UnixFileSystem.getPath(UnixFileSystem.java:312) ~[com.github.schaka.janitorr.JanitorrApplicationKt:na]
	at java.base@23/java.nio.file.Path.of(Path.java:148) ~[com.github.schaka.janitorr.JanitorrApplicationKt:na]
	at com.github.schaka.janitorr.mediaserver.AbstractMediaServerService.pathStructure$janitorr(AbstractMediaServerService.kt:71) ~[com.github.schaka.janitorr.JanitorrApplicationKt:na]
	at com.github.schaka.janitorr.mediaserver.AbstractMediaServerService.createLinks(AbstractMediaServerService.kt:99) ~[com.github.schaka.janitorr.JanitorrApplicationKt:na]

Is there something I'm missing here, or could this be a bug in GraalVM somehow?
Looking at the code, UnixFileSystem definitely reads sun.jnu.encoding. The filepath is received as a valid UTF-8 string via REST.

Logging from within the image provides:

2024-10-14T09:56:37.360Z  INFO 1 --- [           main] c.g.s.j.config.RuntimeEnvironment        : Default charset UTF-8
2024-10-14T09:56:37.360Z  INFO 1 --- [           main] c.g.s.j.config.RuntimeEnvironment        : sun.jnu.encoding ANSI_X3.4-1968
2024-10-14T09:56:37.360Z  INFO 1 --- [           main] c.g.s.j.config.RuntimeEnvironment        : sun.stdout.encoding null
2024-10-14T09:56:37.360Z  INFO 1 --- [           main] c.g.s.j.config.RuntimeEnvironment        : sun.stderr.encoding null
2024-10-14T09:56:37.360Z  INFO 1 --- [           main] c.g.s.j.config.RuntimeEnvironment        : ENV JAVA_TOOL_OPTIONS null
2024-10-14T09:56:37.360Z  INFO 1 --- [           main] c.g.s.j.config.RuntimeEnvironment        : ENV LANG en_US.UTF-8
2024-10-14T09:56:37.360Z  INFO 1 --- [           main] c.g.s.j.config.RuntimeEnvironment        : ENV LANGUAGE null
2024-10-14T09:56:37.360Z  INFO 1 --- [           main] c.g.s.j.config.RuntimeEnvironment        : ENV LC_ALL en_US.UTF-8
@dmikusa
Copy link
Contributor

dmikusa commented Oct 14, 2024

Did you try with -J flag to native-image?

-J pass directly to the JVM running the image generator

So if you're setting additional args like ...

"BP_NATIVE_IMAGE_BUILD_ARGUMENTS" to """
-march=compatibility
-H:+AddAllCharsets
-Dsun.jnu.encoding=UTF-8
-Dfile.encoding=UTF-8

Then try -J-Dsun.jnu.encoding=UTF-8 and -J-Dfile.encoding=UTF-8 (instead of those args without the -J). That should pass those args through to the build time. I see some evidence of that helping similar issues here.

In general, what I suggest with something like this is to get it working without buildpacks. So make it build and work when calling native-image directly (or with their gradle tools) on your machine. When it's working well that way, look at the flags you had to pass to native-image and then update BP_NATIVE_IMAGE_BUILD_ARGUMENTS according.

The buildpack is just installing & running native-image for you. It attempts to add some basic arguments that you'll need, but beyond that, it's up to you to pass additional arguments through.

@Schaka
Copy link
Author

Schaka commented Oct 14, 2024

I did test it without buildpacks about an hour ago and made a small sample project.
I was about to update the issue and figured it may be buildpack related.
I've also considered it may be a problem with Adoptium or Bellsoft.

Here's a small test project:
graalvm-test.zip

If you replace the base image in the Dockerfile with 21 instead of 23, you can start it with any combination of parameters. They end up not mattering and you always get the same error. With 23 it just works and you don't have to set encoding parameters at all.

I've given the -J flag a try and had no success.

I've managed to get `-Dsun.jnu.encoding=UTF-8" and read it from within the image as such by adjusting the CMD (according to docs) but the actual value seemed to get completely ignored.

@dmikusa
Copy link
Contributor

dmikusa commented Oct 14, 2024

This doesn't seem like something that's specific to buildpacks, if I'm following your tests here. If you can reproduce it using the standard Dockerfiles, that's a behavior with native-image itself.

All I can suggest is that we do have Java 23 available in buildpacks, https://github.com/paketo-buildpacks/bellsoft-liberica/releases/tag/v10.9.0, has it Java 23, and that was pulled into https://github.com/paketo-buildpacks/java/releases/tag/v16.1.0 last week's release. So if using Java 23 works with your Dockerfile sample, I'd bet it works with buildpacks too.

@Schaka
Copy link
Author

Schaka commented Oct 14, 2024

So I had already been using Java 23 for a while.
This is my log output in that regard:

$JAVA_TOOL_OPTIONS                                                                        the JVM launch flags
    [creator]         Using Java version 23 from BP_JVM_VERSION
    [creator]       BellSoft Liberica NIK 23.0.0: Contributing to layer

I now added "paketobuildpacks/oracle". I was hoping there was a way to use Oracle's GraalVM as a base image directly and this seems to be it.

Unfortunately, the result is still the same. The resulting image cannot use Path.of("/umläut") without an exception being thrown.

If I can recreate it in a sample project specifically built around paketo's buildpacks, will you look into it?

@dmikusa
Copy link
Contributor

dmikusa commented Oct 14, 2024

What we'd need to look into this is something like your sample project from before that works when you build with a Dockerfile or just on the local machine, but does not work when building the same source code with buildpacks. If you have a sample like that, I can take a look.

@Schaka
Copy link
Author

Schaka commented Oct 14, 2024

native-image-error.zip

If you do gradle bootBuildImage and docker run native-image-error, you'll see the error.
It will not error if you just run it locally, only in the image produced by buildpacks.

You can even try docker run native-image-error -Dsun.jnu.encoding=UTF-8.
I think it's some form of base image that doesn't accept or delegate the LANG or LC_ALL env variables or the base image is one where GraalVM doesn't interpret them correctly at build time.

As you can see in my previous example, it works just fine using Oracle's base image with javac and native-image. I'm at a bit of a loss trying to figure out what's different.

You guys do amazing work, so I wouldn't be surprised if I fucked up somehow.

@dmikusa
Copy link
Contributor

dmikusa commented Oct 15, 2024

A couple of observations...

  • ghcr.io/graalvm/graalvm-community:21 doesn't seem to have locales installed correctly, bash in that image can't even display a filename with an umlaut correctly. Conversely ghcr.io/graalvm/graalvm-community:23 does and bash works correctly there. I don't think this is a difference in native-image, just the container image.

  • [GR-57284] JAVA_TOOL_OPTIONS environment variable in native executables oracle/graal#9504 so JAVA_TOOL_OPTIONS won't work when you run your image, a native image binary doesn't look at this env variable. A native-image binary does support reading system properties from arguments to the application though, so that's why it picks them up when you run your application image passing the system props as args to the binary.

  • As far as I can tell the Paketo base/run images have locale set up correctly. At least with the base & full images, I can exec into the container, run locale and see that it's reporting en_US.UTF-8. I can use UTF-8 codes in file names with bash and everything seems to work OK there. It also has been reported to work with a JVM and other language runtimes. It is just with the native-image app where I have trouble with Path.of. If we are missing a package to make this work, let me know and we can look at adding it. I would just need to know the Ubuntu package to install (or possibly more details on what GraalVM needs, and I could try to find out what Ubuntu package provides that).

  • I tried with Oracle GraalVM 21 & 23. Same results. No difference there.

@Schaka
Copy link
Author

Schaka commented Oct 15, 2024

I've also created an issue over at at GraalVM. The package they use seems to be glibc-all-langpacks.

I can't find the Ubuntu/Debian equivalent. Maybe they can shine a light on it.

The locales package is available on Ubuntu 22, but that should be installed by default.

Having a quick look, at most I can tell that Oracle Linux 9 supplies glibc-all-langpacks 2.34, whereas Ubuntu's locale uses 2.35. Hopefully that one minor version isn't what breaks it here.

Edit: Out of curiosity, I've created a builder from a build and run image using graalvm-community:23, specifically added the glibc-all-langpacks and it still results in the same error. I'm guessing I'd have to adjust the buildpacks too.

@Schaka
Copy link
Author

Schaka commented Oct 18, 2024

I ended up using --patch-module. Recompiling the sun.nio.fs.Util class myself with UTF-8 forced, doing a hacky copying into a folder structure that's accepted by the compiler and it got accepted by the native image without issue.

native-image-error.zip

It's incredibly hacky but may solve this for anyone else that needs a temporary (haha) solution.
The real problem still needs fixing but I suspect it'll be a while before the GraalVM team gives a definitive answer as to what it actually required and the system property is set under the hood.

@dmikusa
Copy link
Contributor

dmikusa commented Oct 18, 2024

Thanks for sharing! I'm subscribed on the GraalVM issue, so I'll keep an eye on what they say and if there's anything we can do to make this just work, I'll open up issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants