Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible Bug: Unicode conversion. #1103

Closed
Clemi81 opened this issue Feb 14, 2019 · 22 comments
Closed

Possible Bug: Unicode conversion. #1103

Clemi81 opened this issue Feb 14, 2019 · 22 comments

Comments

@Clemi81
Copy link

Clemi81 commented Feb 14, 2019

Hello,

there seems to be a bug regarding the conversion of Unicode characters with prawn,

Please see this link for details:
http://discuss.asciidoctor.org/Unicode-characters-not-converted-in-pdf-td6703.html

Thank you for reading,
Clemens

@gettalong
Copy link
Member

I have read the discussion - please try the sample script provided by @mojavelinux with the GNU FreeSerif font. This font has the requested character as well as a mapping table understood by ttfunk, the font library used by Prawn.

It may be the case that other fonts only provide some kind of internal link from the requested character to the greek sigma symbol, and that ttfunk has not implemented support for this kind of redirection.

@mojavelinux
Copy link
Contributor

mojavelinux commented Feb 14, 2019

I can confirm that GNU FreeSerif font works. What's curious is that VL-Gothic-Regular, which also has this character, does not work. So this is very likely a problem in ttfunk as @gettalong has suggested. I did not dive into the two fonts to try to see what the tables look like.

@mojavelinux
Copy link
Contributor

To be clear, this is not a unicode conversion problem. It's a glyph resolution problem.

@Clemi81
Copy link
Author

Clemi81 commented Feb 15, 2019

Ok. What I did is the following. Installed the GNU Free Serif Font. Implemented Font into Asciidoc Document and compiled using Asciidoctor-pdf. I see that the font is different, so this seems to work. However my special characters stay unconverted, they are printed as entered in the asciidoc file.

Example characters:𝜎 or &#355. There is no error message regarding the conversion of the characters.

@mojavelinux
Copy link
Contributor

@Clemi81 Can you please test the sample script below? It's probably not useful for this forum to be testing using Asciidoctor PDF.

require 'prawn'

Prawn::Document.generate 'missing-glyph.pdf' do
  def register_font data
    font_families.tap {|accum| data.each {|key, val| accum[key.to_s] = val } }
  end

  register_font Serif: {
    normal: '/usr/share/fonts/gnu-free/FreeSerif.ttf'
  }

  font :Serif do
    text '𝜎'
  end
end

@Clemi81
Copy link
Author

Clemi81 commented Feb 15, 2019

Dear Dan,
I will do this. But a stupid question, how do I run this script on its own.

I use Windows 7.

@pointlessone
Copy link
Member

@Clemi81 You'd need to install ruby. I assume you already have it installed since you're experiencing the issue. You'd also need to install prawn gem. I assume you have that too for the same reason.

You need to save the script into a file. Then on a command line execute ruby test-script.rb. Note, that you need to fix file path on line 9 (after normal:) to point to the corresponding font file on your hard drive.

@1marc1
Copy link

1marc1 commented Feb 16, 2019

I just ran this script with the font I was testing with in the asciidoctor-pdf discussion.

The resulting output is a single square outline in the top left corner of the page.

image

Marc.

@1marc1
Copy link

1marc1 commented Feb 16, 2019

Some more pictures. When I create the same table as I did in the asciidoctor-pdf discussion, starting from character 120512, while using the font FreeSans, I get the following output:

freesans

Now, when I change the font to FreeSerif, the output looks as follows:

freeserif

Below is a screenshot showing those two fonts next to each other in FontForge. FreeSerif is on the left, FreeSans is on the right. The highlighted position in the top left of each of the two windows is character 120512.

fontforge

I hope this helps.

Marc.

@Clemi81
Copy link
Author

Clemi81 commented Feb 18, 2019

Hello,

I get the same result as Marc for Sigma. Free Serif Works, FreeSans does show a square with ?.

I played a bit more and tested the characters in the Asciidoc.

I did the following. Write unicode of sigma into document: σ
In PDF the character is not rendered, it's shown as : σ
In Firefox Plugin it's shown correctly: σ
The I copy the σ and paste it as it is into the asciidoc document next to σ
Output in PDF is: σ σ
Output in Firefox Plugin is: σ σ

So it seems it's not only a problem of font. It seems that Asciidoctor-pdf does not recognize it should convert the characterσ

For me one workaround could be, to render a character in Firefox and copy it back into asciidoc. Unfortunately there are some Symbols that are still not recognized even when I use this workaround.

For example arrow up ⤉

@mojavelinux
Copy link
Contributor

mojavelinux commented Feb 18, 2019

@Clemi81 Thank you for testing and for the additional information.

I'd like to encourage you again not to discuss behaviors directly related to Asciidoctor PDF here. In this thread, we should only be talking about Ruby code that uses the Prawn API directly. That keeps the discussion on point.

(It's very likely that the raw σ output from Asciidoctor PDF is due to a misconfiguration (or edge case) of your AsciiDoc source and not related to Prawn. It's simply not true that Asciidoctor PDF does not recognize σ in general. However, it could be something specific to your document. Let's discuss that part in the Asciidoctor forum.)

When testing here, we should always be referring back to the sample Ruby application that was posted above.

@Clemi81
Copy link
Author

Clemi81 commented Feb 18, 2019

Dear Dan,

I'm sorry. I tend to post all information I have, but I understand it may be misleading.

Cheers,

Clemens

@1marc1
Copy link

1marc1 commented Feb 18, 2019

I made a slight modification to the script:

require 'prawn'

Prawn::Document.generate 'missing-glyph.pdf' do
  def register_font data
    font_families.tap {|accum| data.each {|key, val| accum[key.to_s] = val } }
  end

  register_font Serif: {
    normal: '/tmp/FreeSerif.ttf'
    #normal: '/tmp/FreeSans.ttf'
  }

  font :Serif do
    text "\u03c3"    #  03c3 is the hex equivalent of decimal 963 - greek small letter sigma
  end
end

The above works for both FreeSerif and FreeSans.

I also tried to use text "\u1d70e" (the hex equivalent to dec 120590), which rendered as an 'n with a middle tilde', followed by the letter 'e'. I found that U+1d70 is actually a glyph in FreeSerif (small letter n with middle tilde). All of this probably says more about my Ruby skills than anything else. Or is is a clue to finding out what is going on? How would one render a large (> U+FFFF) unicode character using the script above and referencing the character by number?

Marc.

@gettalong
Copy link
Member

You can use \u{1d70e} for Unicode characters with a codepoint greater than 0xFFFF - see http://ruby-doc.org/core-2.6.1/doc/syntax/literals_rdoc.html#label-Strings

Generally, please note that although U+03C3 and U+1D70E may have the same appearance, there must be support in the font file for both codepoints. So testing U+03C3 probably won't help for this problem.

@1marc1
Copy link

1marc1 commented Feb 19, 2019

Thank you for clarifying.

@pointlessone
Copy link
Member

I looked a bit into this and I'm not sure what the issue is. To me it looks like a wrong font is used in asciidoctor-pdf or something. The test script Dan provided seem to work as expected. Specifically, it shows a missing character for FreeSans because that font doesn't have a glyph for character 120590 (Mathematical Italic Small Sigma), and it shows a fine sigma with FreeSerif font because it does have the glyph.

I'm closing this now but feel free to reopen if you have more pointers to how Prawn is at fault here.

@mojavelinux
Copy link
Contributor

mojavelinux commented Jan 22, 2024 via email

@pointlessone
Copy link
Member

@mojavelinux I'm confused. It seem like you're addressing a different issue in your comment. Could you please confirm that we're still talking about the missing glyphs? Width of glyphs was never mentioned in this issue or the linked asciidoctor forum thread.

My understanding is that a person wants to use some specific character. They have it in their source document specified as a direct Unicode codepoint. They're confused that the character is displayed in Firefox (I presume, HTML version of the generated document) but not in PDF.

I took your script from your previous comment. I also took the latest FreeFont (freefont-ttf-20120503.zip.

I get a Missing Character glyph with FreeSans font (in the latest version of FreeFont it's a rectangle with a question mark in it.), and a sigma with FreeSerif. I also confirmed that both glyphs have non-zero width. Likewise, I confirm that FreeSerif does have a glyph for character code 12590, and FreeSans does not. Screenshot from 1marc1 is very much indicative (character code 12590 is in row 5, second from the right).

From my point of view, Prawn displays correct glyphs for both fonts. I understand that it doesn't match HTML behaviour. My assumption is that the user didn't specify the correct font and Firefox is more persistent in finding a fallback font with a glyph for the character. I agree that it's great for the end user but this is not the promise Prawn ever gave. Prawn only uses specified fonts and doesn't go looking until it find every single glyph. Could you please confirm, deny, or otherwise state the desired outcome? Could you please describe in what way the output of your test script is different from what you'd expect? Or if it's not representative, could you please provide another one that would demonstrate the issue?

@gettalong
Copy link
Member

@pointlessone @mojavelinux If I may: I think that this issue and the one from #1322 get a bit mixed up. From what I can see in this issue, it is a problem with fonts missing some glyphs, as @mojavelinux said. So I think this issue is indeed solved.

However, the one with the missing glyph (gid=0) having width 0 instead of the correct width of the that glyph, is still being debated over at #1322.

@pointlessone
Copy link
Member

@gettalong I agree. I just want to make sure we're on the same page with Dan.

@mojavelinux
Copy link
Contributor

mojavelinux commented Jan 23, 2024

My mistake. My comment ended up on the wrong issue. I was indeed referring to #1322. And it seems there is now an update there. Please disregard my previous statement as it was misdirected. My apologies.

Thanks @gettalong for playing moderator and getting us back on track.

@mojavelinux
Copy link
Contributor

Back to the topic at hand, I think I know what the problem is. There are two characters which look visually identical, but are not actually the same Unicode character:

puts '𝜎' == 'σ'

The first is U+1d70e (mathematical italic small sigma) whereas the second is U+03c3 (greek small letter sigma, sigma). The font in question is missing one of them, so it correctly displays the missing glyph character. I assert that Prawn is doing the correct thing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

5 participants