-
Notifications
You must be signed in to change notification settings - Fork 587
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Surrogate characters not working #63
Comments
I was able to create a version of PDFsharp that supports surrogate characters. I needed to change some variables to For the project I'm working on it made no problems. But I can't say I tested it toughly with many fonts. |
Thanks for the feedback and your effort. Not much new code and the risk of breaking existing code should be minimal. I think a pull request won't be helpful as we use GitHub for distribution only. |
I'm glad if this will help others. |
If your text has surrogate characters, but don't have a format 12 table, my code currently throws an exception. previous behavior was printing two wrong characters. |
Hello 😃, |
Any update on this? |
@LokiMidgard Thanks a lot for your additions! I applied them to PdfSharp source code version 1.50 beta 5, built the project in Visual Studio Community to get the DLL and added it from "src\PdfSharp\bin\Debug\PdfSharp.dll" to my project. To get emojis working in PDF, I had to use the font "Segoe UI Emoji" (I'm on Windows 10). This works fine, but two problems arise:
Example: System.Text.Encoding.RegisterProvider(System.Text.CodePagesEncodingProvider.Instance);
PdfDocument document = new PdfDocument();
PdfPage page = document.AddPage();
XGraphics gfx = XGraphics.FromPdfPage(page);
XPdfFontOptions options = new XPdfFontOptions(PdfFontEncoding.Unicode, PdfFontEmbedding.Always);
XFont font = new XFont("Segoe UI Emoji", 12, XFontStyle.Regular, options);
gfx.DrawString("111😢😞💪", font, XBrushes.Black, new XRect(0, 0, page.Width, page.Height), XStringFormats.Center);
document.Save("C:\\test.pdf"); I tried to debug, but I do not understand what RawUnicodeEncoding is. For the above string it's "<00140014001425F503B20662>". Numeral 1 corresponds to "0014". But I have no clue. Decimal value for ASCII 1 is 49. Manually counting glyphs in the font "Segoe UI Emoji", glyph 1 should be at position 19. What is the "0014" representing? It's not decimal ASCII nor glyph position in the font. |
It is too long since I worked with this…
My assumption would have been that Did you test if the same emoji results in the same number the seccond time? so |
Finally made progress!
if (!CharacterToGlyphIndex.ContainsKey(ch) || char.IsHighSurrogate(ch))
// If high surrogate char hasn't been added yet, add high and low surrogate chars:
if (!SurrogatePairs.ContainsKey(ch))
SurrogatePairs.Add(ch, new List<char>(text[idx + 1]));
// If high surrogate char has been added and low surrogate char hasn't been added yet, add low surrogate char:
else if (SurrogatePairs.ContainsKey(ch) && !SurrogatePairs[ch].Contains(text[idx + 1]))
SurrogatePairs[ch].Add(text[idx + 1]);
// If high and low surrogate chars have been added, continue with next loop:
else
continue;
if (!CharacterToGlyphIndex.ContainsKey(ch)) // To do (for support of reading PDF?): Surrogate pair chars with same high surrogate chars and different low surrogate chars are missing in "CharacterToGlyphIndex"!
CharacterToGlyphIndex.Add(ch, glyphIndex);
private Dictionary<char, List<char>> SurrogatePairs = new Dictionary<char, List<char>>(); |
@TheRealSourceSeeker Thank you for your work. I hope in a forseable future I get to a point where I will need this code. Since I updated my branch already for dotNet 6 (I think). Tuple should not be a problem :) |
@LokiMidgard Could you report back your result for printing string "1 System.Text.Encoding.RegisterProvider(System.Text.CodePagesEncodingProvider.Instance);
PdfDocument document = new PdfDocument();
PdfPage page = document.AddPage();
XGraphics gfx = XGraphics.FromPdfPage(page);
XPdfFontOptions options = new XPdfFontOptions(PdfFontEncoding.Unicode, PdfFontEmbedding.Always);
XFont font = new XFont("Segoe UI Emoji", 12, XFontStyle.Regular, options);
gfx.DrawString("1♥️1", font, XBrushes.Black, new XRect(0, 0, page.Width, page.Height), XStringFormats.Center);
document.Save("C:\\test.pdf"); Does your PDF show "1 // Skip "Variation Selector-16" (Unicode decimal: 65039, hexadecimal: FE0F)
// as long as colored emojis aren't supported (only black/white "text presentation", no colored "emoji presentation"):
// Reason: Char "♥️" triggers 2 char matches, writing a visual heart and space to PDF:
// 1. "Black Heart Suit" (hexadecimal: 2665, decimal: 9829)
// 2. "Variation Selector-16" (hexadecimal: FE0F, decimal: 65039)
if (ch == 65039)
continue; |
Reporting an Issue Here
Surrogate characters (characters that does not fit in 2 bytes) will not drawn correctly.
Expected Behavior
Drawing string with surrogate characters (e.g. 🅐) should draw the correct glyph.
Actual Behavior
Two non recognizable characters are printed. The surrogate pair is interpreted as two separated characters.
Steps to Reproduce the Behavior
You can reproduce this with the minimal sample repository
The text was updated successfully, but these errors were encountered: