Skip to content

Commit

Permalink
don't "fix" encoding of raw message/rfc822 parts
Browse files Browse the repository at this point in the history
In the code for handling message/rfc822 MIME parts, message.rb line 498,
we were calling the #normalize_whitespace method on the body string
before it was decoded.

I'm not too sure if messing with whitespace is the right thing to do
there, but that aside, that method was then also calling #fix_encoding!
which would forcibly transcode the raw body to UTF-8. Instead, we want to
keep the body as ASCII-8BIT at that point, and let it be decoded using
all the normal message decoding mechanisms.

The only other calls to #normalize_whitespace are in the UI, and in the
code path which handles body text of messages, message.rb line 592,
where the body text has already been decoded. So it seems like we can
safely make #normalize_whitespace just mess with whitespace and leave
the string encoding alone.

Fixes #205.
  • Loading branch information
danc86 committed Jul 12, 2020
1 parent 4204170 commit d3fbac1
Show file tree
Hide file tree
Showing 3 changed files with 59 additions and 1 deletion.
1 change: 0 additions & 1 deletion lib/sup/util.rb
Original file line number Diff line number Diff line change
Expand Up @@ -376,7 +376,6 @@ def transcode to_encoding, from_encoding
end

def normalize_whitespace
fix_encoding!
gsub(/\t/, " ").gsub(/\r/, "")
end

Expand Down
36 changes: 36 additions & 0 deletions test/fixtures/non-ascii-header-in-nested-message.eml
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
Return-Path: <[email protected]>
From: SPAM ® <[email protected]>
To: <[email protected]>
Subject: spam ® spam
MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="----------=_4F506AC2.EE281DC4"
Message-Id: <[email protected]>
Date: Fri, 2 Mar 2012 07:37:55 +0100 (CET)

This is a multi-part message in MIME format.

------------=_4F506AC2.EE281DC4
Content-Type: text/plain; charset=iso-8859-1
Content-Disposition: inline
Content-Transfer-Encoding: 8bit
Spam detection software, running on the system "a.a.a.a.a.", has
identified this incoming email as possible spam. The original message
has been attached to this so you can view it (if it isn't spam) or label
similar future email.
------------=_4F506AC2.EE281DC4
Content-Type: message/rfc822; x-spam-type=original
Content-Description: original message before SpamAssassin
Content-Disposition: attachment
Content-Transfer-Encoding: 8bit
From: SPAM ® <[email protected]>
To: <[email protected]>
Subject: spam ® spam
This is a spam.
------------=_4F506AC2.EE281DC4--

23 changes: 23 additions & 0 deletions test/test_message.rb
Original file line number Diff line number Diff line change
Expand Up @@ -248,6 +248,29 @@ def test_nonascii_header
assert_equal("spam \ufffd spam", sup_message.subj)
end

def test_nonascii_header_in_nested_message
source = DummySource.new("sup-test://test_nonascii_header_in_nested_message")
source.messages = [ fixture_path("non-ascii-header-in-nested-message.eml") ]
source_info = 0

sup_message = Message.build_from_source(source, source_info)
chunks = sup_message.load_from_source!

assert_equal(3, chunks.length)

assert(chunks[0].is_a? Redwood::Chunk::Text)

assert(chunks[1].is_a? Redwood::Chunk::EnclosedMessage)
## TODO need to fix EnclosedMessage#lines
#assert_equal(4, chunks[1].lines.length)
#assert_equal("From: SPAM \ufffd <[email protected]>", chunks[1].lines[0])
#assert_equal("spam \ufffd spam", chunks[1].lines[3])

assert(chunks[2].is_a? Redwood::Chunk::Text)
assert_equal(1, chunks[2].lines.length)
assert_equal("This is a spam.", chunks[2].lines[0])
end

def test_malicious_attachment_names
source = DummySource.new("sup-test://test_blank_header_lines")
source.messages = [ fixture_path('malicious-attachment-names.eml') ]
Expand Down

0 comments on commit d3fbac1

Please sign in to comment.