Build out the name table decoding function to cover all platform/encodings #85

Pomax · 2020-09-09T20:32:50Z

Right now it's using a fairly "naive" UTF16 decoding for anything with platformID 0 (Unicode) or 3 (Microsoft), with "ascii" byte decoding for anything else, but that glosses over a fair number of platformId/encodingId combinations, so... if someone wants to help out implementing all the various string decodings, let me know!

RoelN · 2020-12-17T14:36:25Z

Is this why

Mike Abbink, Paul van der Laan, Pieter van Rosmalen, Ben Mitchell, Mark Frömberg

gets output by lib-font as

Mike Abbink, Paul van der Laan, Pieter van Rosmalen, Ben Mitchell, Mark Fr�mberg?

I'm testing this with https://github.com/IBM/plex/blob/master/IBM-Plex-Sans-Thai/fonts/complete/ttf/IBMPlexSansThai-Light.ttf

Pomax · 2020-12-18T00:50:23Z

that sounds more like whatever is rendering that text is not set to render as utf-8: where are you seeing this output? =)

RoelN · 2020-12-18T07:10:01Z

I'm seeing this on the command line, but also in the Wakamai Fondue output. This is the test script I used to verify this already happened at the lib-font side (as opposed to some data kneading on the WF side):

import { Font } from "./lib-font.js";

const font = new Font("testfont");
font.src = "./IBMPlexSansThai-Light.ttf";

font.onload = (evt) => {
  let font = evt.detail.font;
  const { name } = font.opentype.tables;
  console.log(name.get(9));
};

skyeewers · 2022-09-28T15:11:33Z

Has there been any progress on this issue? I think I'm running into the same issue as @RoelN with non-ascii characters just showing as the character-not-found-"?"-character, but I don't know enough about the encoding side of things here to try and fix this myself 😓.

Pomax · 2022-09-29T01:10:37Z

The commandline is notorioulsy bad at utf8, so, a small change to make this easier to test with:

import fs from "fs";
import { Font } from "./lib-font.js";

const font = new Font("testfont");
font.src = "IBMPlexSansThai-Light.ttf";

font.onload = (evt) => {
  let font = evt.detail.font;
  const { name } = font.opentype.tables;
  fs.writeFileSync(`test.out`, `name: ${name.get(9)}`, `utf-8`);
};

Yields

name: Mike Abbink, Paul van der Laan, Pieter van Rosmalen, Ben Mitchell, Mark Fr�mberg

The bad character there is a 0x009A, which is clearly wrong. Let's do some byte checks. Throwing this into an inspector:

Gives use the following data block to inspect:

4D 69 6B 65 20 41 62 62 69 6E 6B 2C 20 50 61 75
6C 20 76 61 6E 20 64 65 72 20 4C 61 61 6E 2C 20
50 69 65 74 65 72 20 76 61 6E 20 52 6F 73 6D 61
6C 65 6E 2C 20 42 65 6E 20 4D 69 74 63 68 65 6C
6C 2C 20 4D 61 72 6B 20 46 72 9A 6D 62 65 72 67

And indeed, those last few bytes are "r", 0x9A, "m", "b", "e", "r", and "g" if interpreted as ASCII... so it's not a matter of reading the bytes wrong.

We also see this uses platformID=1, platEncID=0 and langID=0, which means we should be treating this as Mac/Roman/English. sooooo we look up that encoding and find that 0x9A should be "ö"

So this is definitely a decoding issue.

Pomax · 2022-09-29T02:42:29Z

I'm of two minds here.

1: we add all possible decoding schemes to lib-font, blowing it up to an incredible size, but make all strings come out as well-behaved UTF8
2: make this the consuming code's responsibility, with lib-font giving you the bytes, and the information you need to know what encoding it's using, but not performing automagical conversion to UTF8

And honestly, I'm leaning heavily towards (2) because it doesn't make sense to bake string encoding conversion into this library rather than making than an "if you need it, you know better than I do how to slot that into your own code base".

That said, we could make that a separate project (if it doesn't already exist!) and do something clever like an optional decoder argument to the Font constructor so that if there is on, strings can be magic'd, and if there isn't, you might need to do your own decoding.

Pomax added help welcome Want to help out? Have a look at issues tagged with this label. enhancement Making working code work better. labels Sep 9, 2020

Pomax mentioned this issue Sep 10, 2020

Name table entries contain null bytes #74

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Build out the name table decoding function to cover all platform/encodings #85

Build out the name table decoding function to cover all platform/encodings #85

Pomax commented Sep 9, 2020

RoelN commented Dec 17, 2020

Pomax commented Dec 18, 2020

RoelN commented Dec 18, 2020

skyeewers commented Sep 28, 2022

Pomax commented Sep 29, 2022 •

edited

Loading

Pomax commented Sep 29, 2022 •

edited

Loading

Build out the name table decoding function to cover all platform/encodings #85

Build out the name table decoding function to cover all platform/encodings #85

Comments

Pomax commented Sep 9, 2020

RoelN commented Dec 17, 2020

Pomax commented Dec 18, 2020

RoelN commented Dec 18, 2020

skyeewers commented Sep 28, 2022

Pomax commented Sep 29, 2022 • edited Loading

Pomax commented Sep 29, 2022 • edited Loading

Pomax commented Sep 29, 2022 •

edited

Loading

Pomax commented Sep 29, 2022 •

edited

Loading