Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect Width Calculation for Characters with Variation Selector-16 #274

Open
skiars opened this issue Jun 5, 2024 · 2 comments
Open

Comments

@skiars
Copy link

skiars commented Jun 5, 2024

Description

I've noticed that the wctwidth / wcswidth function in the vty library seems to miscalculate the width of certain text, particularly those which involve the Variation Selector-16 (U+FE0F). For example, the emoji 🏞️ (National Park) is composed of U+1F3DE followed by U+FE0F. This should be rendered as a colorful double-width emoji. However, the current width calculation doesn't seem to reflect this correctly.

Steps to Reproduce

  1. Use the wctwidth function to calculate the width of the character 🏞️ (which is U+1F3DE followed by U+FE0F).
  2. Observe that the calculated width does not match the expected width (normally, it should be 2).

Example Code

ghci> import Graphics.Text.Width
ghci> wcswidth "🏞️"  -- This should ideally return 2, but it doesn't

The wcswidth function should return 2 for the character 🏞️ as it should be considered a double-width emoji. But the wcwidth function currently returns 1, which does not account for the Variation Selector-16 and results in incorrect rendering where the cursor position becomes misaligned in terminals.

Environment

  • OS: ArchLinux
  • Terminal: Konsole
  • Vty Version: 6.2

Additional Context

Variation Selector-16 (U+FE0F) is used to indicate that the preceding character should be displayed as an emoji. Proper support for this selector is crucial for accurate width calculation of such Unicode sequences.

For reference, this issue has been observed with Windows Terminal too, and here is some relevant information:

Would be great to discuss potential fixes or workarounds for this issue. Thank you for your attention and support!

@jtdaugherty
Copy link
Owner

Thanks for filing this!

This version of the problem is due to the fact that Vty does no lookahead when computing character width. There are other Unicode features that would also need to be considered to do proper lookahead as far as I am aware, such as zero-width joiners. I haven't investigated what it would take to do this properly, largely because I really don't want to re-implement various bits of the Unicode spec in vty. So if you know of a Haskell implementation that deals with this in an efficient way, I would love to know about it!

The problem is deeper even than what is reported here, unfortunately. For posterity, there are some other older tickets that capture some of the issues:

Essentially, Vty could have a perfectly correct implementation of width calculation and then still disagree with some terminal emulators on the width of some Unicode sequences, depending on the implementation in those terminal emulators. We've run into this when trying to "fix" Vty in this way, only to have to back out the changes because Vty then came into greater disagreement with some terminal emulators on character widths, resulting in broken rendering and cursor placement. @glguy helped us develop a partial solution to this problem by interrogating the terminal to ask it how wide Unicode characters are, but that only worked for single-character tests that don't require lookahead, so in practice it doesn't work well enough to be a fully general solution.

Given those issues, I don't know what the best path forward is. At a minimum, it would be nice to have access to an implementation of width calculations that we don't need to maintain, leaving that to people who know the Unicode spec and its various versions much better than I do. If we had that, that would at least give us a starting point for knowing what good needs to look like. If we had that, then we could at least see how well that works in practice with terminal emulators whose Unicode implementations might also be stale and/or incorrect when it comes to character widths.

@jtdaugherty
Copy link
Owner

(And, utf8proc was an attempt at exactly this: relying on what seemed to be a well-maintained library for dealing with some of these issues. Maybe that's still a good way to go, ultimately, but I don't recall whether that library would have helped with lookahead-related issues.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants