-
Notifications
You must be signed in to change notification settings - Fork 96
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RMail and malformed UTF-8 headers #205
Comments
Perhaps it is better to use the [0] http://www.ruby-doc.org/core-2.1.0/Regexp.html#class-Regexp-label-Encoding |
The proper fix would be of course to migrate to Mail.. #127. |
I think you are missing the actual error from the trace, could you post the full output from the crash? |
I don't have the full log anymore. I should have saved it. I've tried finding the message that triggered it, but failed. It's easy to recreate the error in irb, though:
So the full error log would be this error message, followed by the traceback in the original report. |
Could you try the changes in pull-request #214 with a manually installed version of rmail with sup-heliotrope/rmail-sup#3 applied? |
I set up a separate user account on the same machine the error occurred on and I added the pull requests you mentioned. I did some Bundler stuff to get it all to work. I added my own inbox mbox file to this sup's sources to make sure the malformed headers are present. I ran
This goes beyond my poor knowledge of some email RFCs. |
Hm, this is a different, but similar problem.. do you know what address this fails on? |
Here's the hexdump of the address. Or the contents of
Edit: EF BF BD is UTF-8 for U+FFFD, the replacement character. So that first character might have been something different. I'm looking into it. Edit 2: Found it!
Yes, that's most definitely spam. The byte that made sup crash is |
Hm, what version of ruby and rmail are you running? |
I'm using the rmail from sup-heliotrope/rmail-sup#3 and ruby 1.9.3p484. |
Could you post the entire message somewhere, I'm not able to crash sup with just those bits. |
Preferably as a binary upload |
By the way, you can use the environment variable |
This is the message. The latest sup release (without the two pull requests) does not crash on this message. Thanks for the |
Hi, could you try the latest tips of both pull-requests. Managed to crash sup again, but the test case crashes anyway as long as rubymail isn't patched for me. Anyway, I don't think I can merge this yet since i'm not sure the regexp's in rubymail make much sense.. |
By the way, I added your message to the test base - but removed any references I could find to your address or servers.. hope that is ok. |
Well, sup-sync has been chugging along happily for the past ten minutes. It seems that fixes the crash. And I don't mind my email and server addresses used in a test case. The addresses are plastered all over the Internet anyway. Hence all the spam. ;) |
Yup, but the two changed regexps in sup-heliotrope/rmail-sup#3 has to be fixed, before it can be merged.. (again, the real fix is to move to Mail.) |
As reported in #310 setting the locale to C: LANG/LC_ALL=C might bypass this problem. |
Hi, |
I can't get sup to run with 2.2.1, so i'm using ruby 2.0.0-p195. i'm getting this error as well (sup v0.22.1). Was this bug supposed to be fixed elsewhere? |
@gauteh I'm seeing this bug as well:
This is quite annoying as all mails which are sent out trigger this bug and those mails are not saved into -mailbox. Is there a fix for that bug? |
The work around does not work for me. Is the move to the non-rmail fork of sup the problem? |
Bump. What's the status of this? I'm keep getting it even on 0.22.1 with rbenv-installed Ruby 2.1.2 on macOS Sierra. I see that the proposed solution seemed to be #127, but the last update on that is on Aug, 2015. Here is my traceback, for what is worth.
|
With sup 0.23 and rmail 1.1.4 this bug still exists although the traceback is a little different:
|
Ahh nope ignore the previous comment. That's a different error which only happens in the test suite, because it uses a different code path for loading test messages. The actual traceback with sup 0.23 and rmail 1.1.4 is this (slightly better than before but still broken):
|
Okay so there are two different issues going on here, both related to non-ASCII characters in headers (which are illegal, they should be encoded using RFC2047 header encoding, but spammers don't follow the rules). Non-ASCII bytes in the mail header itself was crashing sup, but that was fixed in rmail 1.1.2: If there is a nested
wrongly forces the raw message body to be marked as UTF-8 when it should still be ASCII-8BIT because it hasn't been decoded to text yet. The |
This patch changes DummySource to follow a similar pattern to the Maildir source, which works by opening the files directly for parsing. Previously DummySource was parsing strings fed through StringIO, which was introducing extra (or at least, different) complications with encoding. See for example the comments in issue sup-heliotrope#205.
In the code for handling message/rfc822 MIME parts, message.rb line 498, we were calling the #normalize_whitespace method on the body string before it was decoded. I'm not too sure if messing with whitespace is the right thing to do there, but that aside, that method was then also calling #fix_encoding! which would forcibly transcode the raw body to UTF-8. Instead, we want to keep the body as ASCII-8BIT at that point, and let it be decoded using all the normal message decoding mechanisms. The only other calls to #normalize_whitespace are in the UI, and in the code path which handles body text of messages, message.rb line 592, where the body text has already been decoded. So it seems like we can safely make #normalize_whitespace just mess with whitespace and leave the string encoding alone. Fixes sup-heliotrope#205.
This patch changes DummySource to follow a similar pattern to the Maildir source, which works by opening the files directly for parsing. Previously DummySource was parsing strings fed through StringIO, which was introducing extra (or at least, different) complications with encoding. See for example the comments in issue #205.
In broken email messages (mostly spam) the headers aren't always encoded properly (I think?), causing sup to crash. It feeds the UTF-8 string into RMail::Header::Field.parse, which in turn defines an ASCII-8bit regex and matches the string against that. For now I've fixed it like this in rmail-sup's header.rb:
But I've got no idea if it's sup who should send it an ASCII string, or RMail who should coerce it into an ASCII string.
And here's the traceback. The line number in header.rb is off a bit, but you get the idea:
The text was updated successfully, but these errors were encountered: