-
-
Notifications
You must be signed in to change notification settings - Fork 72
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Build out the name table decoding function to cover all platform/encodings #85
Comments
Is this why
gets output by lib-font as
I'm testing this with https://github.com/IBM/plex/blob/master/IBM-Plex-Sans-Thai/fonts/complete/ttf/IBMPlexSansThai-Light.ttf |
that sounds more like whatever is rendering that text is not set to render as utf-8: where are you seeing this output? =) |
I'm seeing this on the command line, but also in the Wakamai Fondue output. This is the test script I used to verify this already happened at the lib-font side (as opposed to some data kneading on the WF side): import { Font } from "./lib-font.js";
const font = new Font("testfont");
font.src = "./IBMPlexSansThai-Light.ttf";
font.onload = (evt) => {
let font = evt.detail.font;
const { name } = font.opentype.tables;
console.log(name.get(9));
}; |
Has there been any progress on this issue? I think I'm running into the same issue as @RoelN with non-ascii characters just showing as the character-not-found-"?"-character, but I don't know enough about the encoding side of things here to try and fix this myself 😓. |
The commandline is notorioulsy bad at utf8, so, a small change to make this easier to test with:
Yields
The bad character there is a 0x009A, which is clearly wrong. Let's do some byte checks. Throwing this into an inspector: Gives use the following data block to inspect:
And indeed, those last few bytes are "r", 0x9A, "m", "b", "e", "r", and "g" if interpreted as ASCII... so it's not a matter of reading the bytes wrong. We also see this uses platformID=1, platEncID=0 and langID=0, which means we should be treating this as Mac/Roman/English. sooooo we look up that encoding and find that 0x9A should be "ö" So this is definitely a decoding issue. |
I'm of two minds here. 1: we add all possible decoding schemes to lib-font, blowing it up to an incredible size, but make all strings come out as well-behaved UTF8 And honestly, I'm leaning heavily towards (2) because it doesn't make sense to bake string encoding conversion into this library rather than making than an "if you need it, you know better than I do how to slot that into your own code base". That said, we could make that a separate project (if it doesn't already exist!) and do something clever like an optional decoder argument to the Font constructor so that if there is on, strings can be magic'd, and if there isn't, you might need to do your own decoding. |
Right now it's using a fairly "naive" UTF16 decoding for anything with platformID 0 (Unicode) or 3 (Microsoft), with "ascii" byte decoding for anything else, but that glosses over a fair number of platformId/encodingId combinations, so... if someone wants to help out implementing all the various string decodings, let me know!
The text was updated successfully, but these errors were encountered: