Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Emoji should default to emoji presentation, not text presentation #1104

Closed
gnprice opened this issue Dec 4, 2024 · 2 comments · Fixed by #1108
Closed

Emoji should default to emoji presentation, not text presentation #1104

gnprice opened this issue Dec 4, 2024 · 2 comments · Fixed by #1108
Assignees
Labels
a-content Parsing and rendering Zulip HTML content, notably message contents
Milestone

Comments

@gnprice
Copy link
Member

gnprice commented Dec 4, 2024

Some emoji characters in Unicode have a "text presentation" as well as an "emoji presentation". Typically the "text presentation" is black-and-white and around the same size as letter glyphs, while the "emoji presentation" is colorful and larger.

For example, ❤ U+2764 HEAVY BLACK HEART can appear either as ❤︎ (the text presentation) or ❤️ (the emoji presentation).

On choosing between the two, the Unicode spec (in TR 51) says:

In informal environments like texting and chats, it is more appropriate for most emoji characters to appear with a colorful emoji presentation, and only get a text presentation with a text presentation selector. Conversely, in formal environments such as word processing, it is generally better for emoji characters to appear with a text presentation, and only get the colorful emoji presentation with the emoji presentation selector.

So for Zulip we should prefer the colorful emoji presentation. Currently we favor the text presentation instead, for those emoji that have both. For example, this message reads :heart: in Markdown, and on the web it looks like so:
image
But in this app it currently looks like so:
image
(In zulip-mobile it looks… blank. I don't have a diagnosis for that.)

The symptom was originally spotted by @rajveermalviya, at 3efe038 / 3efe038 (in the draft #1103).

This issue doesn't affect most emoji — newer emoji, including those added in the emoji boom, have only emoji presentations. Probably U+2764, aka :heart: in Zulip, is the most conspicuous example.

Implementation

One way to control the presentation can be to follow the emoji with the code point U+FE0F VARIATION SELECTOR-16 to request emoji presentation, or U+FE0E VARIATION SELECTOR-15 for text presentation. (That's what I did at the top of this issue description, to get the two different presentations here in GitHub.) It'd be messy to try to do that globally everywhere emoji appear, though.

Happily I think we can control it instead by the ordering of font choices: just make sure the emoji font comes first. An emoji with both kinds of presentation will have glyphs in both the emoji font and a plain-text font, and what we want is to select the emoji font's glyph.

Scope

This issue covers only where we explicitly know we're working with an emoji:

  • UnicodeEmojiNode in message content, corresponding to a span.emoji element in the HTML;
  • emoji reaction chips;
  • emoji autocomplete results;
  • and I think that's currently it.

(For reaction chips and autocomplete results we already get this right on Android, and the issue only affects iOS. See our UnicodeEmojiWidget, and the changes there in #1108.)

Out of scope for this issue is when a literal emoji character appears as part of some user-generated text:

  • TextNode in message content, corresponding to a text element in the HTML. As far as I know there's no longer a way to generate a message with emoji in a text node, though there used to be — compare this test message yesterday, where the only emoji is in an emoji span, to the message from 2020 it's quoting, where the same Markdown source produced some literal emoji following an emoji span.
  • Topics, channel names, users' names, organization names, and other places where more-or-less-arbitrary plain text appears.

Those are out of scope because they're harder — see the comments below — and also are much less common. I'll file a separate follow-up issue for them.

@gnprice gnprice added the a-content Parsing and rendering Zulip HTML content, notably message contents label Dec 4, 2024
@gnprice gnprice added this to the M5: Launch milestone Dec 4, 2024
@gnprice gnprice self-assigned this Dec 4, 2024
@gnprice
Copy link
Member Author

gnprice commented Dec 4, 2024

Happily I think we can control it instead by the ordering of font choices: just make sure the emoji font comes first.

Hmm, nope — that goes much too far:
image

Note the giant spaces in the app bar, recipient headers, and message text; and the emoji-styled numbers in the timestamps and even an image filename. Those happen because U+0020 SPACE and U+0030..0039 DIGIT ZERO..DIGIT NINE all have glyphs in the Noto Color Emoji font.

In fact it looks like according to the Unicode emoji data files pointed to by TR 51, U+2764 aka :heart: is in exactly the same category as U+0030 DIGIT ZERO and the other plain old ASCII digits: they both have Emoji_Presentation=No (meaning they won't default to emoji presentation in a general-purpose context like a browser), but have Emoji=Yes (meaning they do have an emoji presentation).

That rather seems like a bug in TR 51. Rereading it (in section 4, "Presentation Style") and comparing with that data file emoji-data.txt, the spec seems to pretty clearly say that "informal environments like texting and chats" should show a character like "0" as an emoji. Clearly nobody wants that.

@gnprice
Copy link
Member Author

gnprice commented Dec 5, 2024

In principle one solution that would be nice here is to have a font file that's a subset of Noto Color Emoji, containing only the glyphs we actually prefer (so excluding U+0020 SPACE and the digits 0..9 and so on), and put that font first in the list. (Then we'd have a separate font file that is the complementary subset, and when specifically presenting an emoji we'd list both the emoji fonts before the text font.)

It looks like there's no legal obstacle to that — the OFL license that Noto Color Emoji comes with (recorded in our tree at assets/Noto_Color_Emoji/LICENSE) allows modification, under permissive conditions.

The OFL license has an option for the copyright holder to specify a "Reserved Font Name", and then modifying the font might come with an annoying need to rename the font too; but the Noto fonts have no reserved name in general, and in particular the copyright notice in this font file is just "Copyright 2022 Google Inc." — no mention of a reserved name, and that's the place the license prescribes for specifying one.

So I spent about an hour today trying to find an appropriate tool to produce that subset. It didn't work out, so I'm abandoning that line of investigation here. In particular I tried:

  • FontForge. You can write something like fontforge -c 'f=open(argv[1]); s=f.selection; s.select(("ranges",), 0x00, 0xFF); f.clear(); f.save(argv[2])' assets/Noto_Color_Emoji/Noto-COLRv1{,-subset}.ttf, and it will read the given font and remove the given range of characters and write out to the given file.

    But the format it writes with that f.save call is FontForge's own internal ASCII-based format. For producing a font file that UI software will consume, one uses font.generate(…). And that calls for all kinds of flags specifying details of how the font should be — it doesn't look like it has a feature to preserve all the structure from the input font and just remove the given glyphs.

  • fonttools, and its pyftsubset CLI program. When I ran a command to just copy the entire font, without yet specifying a smaller subset:
    pyftsubset assets/Noto_Color_Emoji/Noto-COLRv1.ttf --unicodes='*'
    it crashed with a cryptic Python traceback, ending with:

      File "/usr/lib/python3/dist-packages/fontTools/ttLib/tables/otBase.py", line 266, in readValue
        value, = struct.unpack(f">{typecode}", self.data[pos:newpos])
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    struct.error: ('unpack requires a buffer of 1 bytes', 'ClipList')
    

    (That's with the fonttools version 4.38.0-1 package I got from Debian.)

Probably the next thing I would try is to use the upstream noto-emoji tree itself, from https://github.com/googlefonts/noto-emoji , with the tools they have there for producing the font files, and tweak those scripts to produce the desired subsets. But that means a further escalation in the effort level, so not doing that in the near term.

gnprice pushed a commit to gnprice/zulip-flutter that referenced this issue Dec 6, 2024
Some unicode characters, like U+2764 (❤) or U+00AE (®) can
have glyphs in non-Emoji fonts, resulting in incorrect
rendering of such characters, where we specifically want an
emoji to be displayed.

So, explicitly mention "Apple Color Emoji" to be the font used on
iOS/macOS for displaying the unicode emoji.

This resolves part of zulip#1104, namely for reaction chips
and autocomplete results.

[greg: wrote test]

Fixes-partly: zulip#1104
@gnprice gnprice closed this as completed in f7421bf Dec 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
a-content Parsing and rendering Zulip HTML content, notably message contents
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

1 participant