Defect #19737

HTML Sanitizer not working for Outlook mails

Added by Rupesh J almost 3 years ago. Updated almost 3 years ago.

Status:ClosedStart date:
Priority:HighDue date:
Assignee:Jean-Philippe Lang% Done:

0%

Category:Email receiving
Target version:3.1.0
Resolution:Fixed Affected version:3.0.2

Description

HTML Sanitizer not working for Outlook mails.
There are some unwanted tags not stipped from the mail body.

sample2.htm Magnifier (129 KB) Rupesh J, 2015-05-08 16:15

Test email.msg - Outlook 2010 HTML email (no special formatting chosen during composition) (51 KB) Deoren Moor, 2015-05-11 17:59

Redmine-Mail-HTML-Mess-via-OutlookWebAccess2010.eml - Outlook Web Access 2010, HTML Sanitization issue. ASCII EML (4.57 KB) Geoff Maciolek, 2015-05-19 22:51


Related issues

Related to Redmine - Feature #16962: Better handle html-only emails Closed

History

#1 Updated by Toshi MARUYAMA almost 3 years ago

  • Target version set to 3.0.3

#2 Updated by Toshi MARUYAMA almost 3 years ago

This is Redmine 3.0 regression according to #19537#note-9.

#4 Updated by Toshi MARUYAMA almost 3 years ago

#5 Updated by Deoren Moor almost 3 years ago

+1

I can confirm that Outlook emails are improperly handled:

The actual human-readable text is three lines. However a similar issue is also happening with iOS (v7.x) emails as I reported on #15716.

Another issue I noticed along with this one is that the Truncate emails after one of these lines setting does not appear to be working. I've opened #19740 for that.

Additional notes:

  • I tested sending from Outlook and converting to Plaintext when replying and the email was scraped properly. The truncate option also seemed to work properly
  • Same (good, expected) results when responding via Exchange web mail and first converting format to Plain text before replying.
  • I tested using Exchange web mail and got similar results (see below)
  • I tested replying using Outlook and leaving the email as-is and the email was rejected with Validation failed: Documented cannot be blank. Presumably this is because of the Truncate option not working properly because the custom Documented field is listed well below the Truncate phrase that we're using.
Reply from Exchange webmail, left in HTML format
<!--
body {font-family:Verdana,sans-serif;
font-size:0.8em;
color:#484848}
h1, h2, h3 {font-family:"Trebuchet MS",Verdana,sans-serif;
margin:0px}
h1 {font-size:1.2em}
h2, h3 {font-size:1.1em}
a, a:link, a:visited {color:#2A5685}
a:hover, a:active {color:#c61a1a}
fieldset.attachments {border-width:1px 0 0 0}
hr {width:100%;
height:1px;
background:#ccc;
border:0}
span.footer {font-size:0.8em;
font-style:italic}
-->
P {margin-top:0;margin-bottom:0;}

Responding via web interface. I'm leaving the format as HTML and not manually pruning existing text in the reply.

#6 Updated by Jean-Philippe Lang almost 3 years ago

  • Status changed from New to Needs feedback

HTML Sanitizer not working for Outlook mails.

Please attach such an email (with dummy text), so we can start working on this.

#7 Updated by Rupesh J almost 3 years ago

I saved the outlook msg file as HTML,
let me know if this attachment is ok?

Thanks.

#8 Updated by Jean-Philippe Lang almost 3 years ago

A .eml file that contains the full email source would be better, thanks.

#9 Updated by Jean-Philippe Lang almost 3 years ago

  • Target version changed from 3.0.3 to 3.0.4

#10 Updated by Deoren Moor almost 3 years ago

Jean-Philippe Lang wrote:

A .eml file that contains the full email source would be better, thanks.

From the attached msg file the following was scraped and inserted into the OP of a new issue on our Redmine v3.0.2 installation (raw text, grabbed by choosing to "edit" the OP so I could copy/paste here):

Thanks for your work on this!

#11 Updated by Deoren Moor almost 3 years ago

If I can provide any further information please let me know. Thanks.

#12 Updated by Geoff Maciolek almost 3 years ago

For another datapoint, I've uploaded another .eml file (and included the original content below), pulled via IMAP right out of the mailbox; I've sanitized the company info, but it's otherwise unmodified. I believe you'll find this easier to work with than Deoren Moor's message, as that seems to have some outlook-specific content. This one is plane jane ASCII. (With HTML embedded, of course).

I had attempted to include some friendly expanding code blocks to show what arrived, but when I did so I got blocked as a spammer, so... I'm trying a bit less:

Note that the below "What I get in Redmine as source" is also what I get if I "edit" the comment.

#13 Updated by Geoff Maciolek almost 3 years ago

As you can see, it looks like it's not parsing the HTML comment inside HEAD as a comment, but then something else weird happens that's leading to the P {...} section showing up as well. It looks to me like this would be resolved by having everything inside <HEAD>...</HEAD> get ignored - maybe easier said than done?

#14 Updated by Deoren Moor almost 3 years ago

Is there any further information we can provide?

#15 Updated by Jean-Philippe Lang almost 3 years ago

  • Status changed from Needs feedback to Closed
  • Assignee set to Jean-Philippe Lang
  • Target version changed from 3.0.4 to 3.1.0
  • Resolution set to Fixed

Thanks for the examples, these 2 are now fixed as part of #16962 and added as tests.

Also available in: Atom PDF