Encoding for getBodyHtml is always UTF8? #515

jahrralf · 2021-08-09T22:44:02Z

Q	A
ddeboer/imap version	1.12.1
PHP version	7.4
IMAP provider	netcup.net

Summary

getBodyHtml always returns UTF8, parsing this HTML will lead to errors because the HTML meta tags indicate a different encoding and this is followed by DOMDocument, e.g.. This leads to broken umlauts.

Current behavior

When I load the html body of a message, it is always encoded/transcoded to UTF8 although the message part is in a different encoding. This is an issue because HTML also has information about the encoding and this does not get changed by ddeboer/imap. When this string is then processed by DOMDocument it leads to trouble with umlauts because DOMDocument observes the meta tags in the html which still show the original encoding, saving it then leads to double-encoding (because the UTF8 chars are saved as ANSI chars, for example).

What I would need is the html body in the original encoding so that it matches with the meta tags.

How to reproduce: code & error stack trace

Load non-UTF8 message from the server and check the bytes, umlauts will be encoded as UTF8 automatically although the server said the message part was not UTF8 encoded.

Expected behavior

getBodyHtml should return the HTML body in the original encoding or have a parameter to disable to forced UTF8 encoding.

jahrralf added the bug label Aug 9, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Encoding for getBodyHtml is always UTF8? #515

Encoding for getBodyHtml is always UTF8? #515

jahrralf commented Aug 9, 2021

Encoding for getBodyHtml is always UTF8? #515

Encoding for getBodyHtml is always UTF8? #515

Comments

jahrralf commented Aug 9, 2021

Summary

Current behavior

How to reproduce: code & error stack trace

Expected behavior