Project

General

Profile

Actions

Feature #16962

closed

Better handle html-only emails

Added by Felix Schäfer over 11 years ago. Updated about 3 years ago.

Status:
Closed
Priority:
Normal
Category:
Email receiving
Target version:
Start date:
Due date:
% Done:

0%

Estimated time:
Resolution:
Fixed

Description

Currently html-only emails are handled by just removing all html tags, this causes really munched together formatting, because html doesn't rely on newlines for formatting, and styles present in the head are also just passed over.


Files

16962.patch (2.21 KB) 16962.patch Felix Schäfer, 2014-05-23 13:42
16962.2.patch (2.59 KB) 16962.2.patch Felix Schäfer, 2014-05-23 14:06
Snip20141227_1.png (28.6 KB) Snip20141227_1.png Krishna Gollamudi, 2014-12-27 18:49
Test Email.msg (21 KB) Test Email.msg Krishna Gollamudi, 2014-12-29 17:39

Related issues

Related to Redmine - Feature #8462: Save received email as an attachment, not as a commentNew2011-05-30

Actions
Related to Redmine - Defect #12025: rake receive_pop3 is stripping formatting and new linesClosed

Actions
Related to Redmine - Defect #18374: Incoming emails are truncating CRLFClosed

Actions
Related to Redmine - Defect #19737: HTML Sanitizer not working for Outlook mailsClosedJean-Philippe Lang

Actions
Related to Redmine - Defect #19740: "Truncate emails after one of these lines" setting is not workingClosed

Actions
Related to Redmine - Defect #10201: MailHandler ignores line feeds when handling html-only emailsClosed

Actions
Related to Redmine - Defect #13284: Should not try to strip HTML tags from text/plain MIME partClosed

Actions
Related to Redmine - Defect #6705: Incoming emails from Apple Mail not parsed correctlyClosed2010-10-20

Actions
Related to Redmine - Patch #12937: mail_handler patchClosed

Actions
Related to Redmine - Defect #14668: Incoming emails has wrong line breaksClosed

Actions
Related to Redmine - Feature #13844: Option to enable HTML emails for issue creation.Closed

Actions
Has duplicate Redmine - Feature #16034: Support for HTML in incoming mailsClosed

Actions
Actions #1

Updated by Felix Schäfer over 11 years ago

The attached patch expands the html-only email used for testing by markup similar to what we have experienced at Planio. A growing number of users send html-only emails, especially Outlook users.

The test run with the current MailHandler results in:

p {font-size:12.0pt;}\r\n\r\n\r\nThis is a html-only email.With a titleand a paragraph.

There is content from the head and the actual email text is all squashed together.

The result with the patch applied is:

This is a html-only email.\r\nWith a title\r\n\r\nand a paragraph.

The loofah gem parses the html based on Nokogiri and tries to handle tags a little more intelligently than just removing them.

Actions #2

Updated by Jan from Planio www.plan.io over 11 years ago

  • Tracker changed from Feature to Patch
  • Status changed from New to Needs feedback
  • Target version set to Candidate for next minor release
Actions #3

Updated by Felix Schäfer over 11 years ago

Here is an updated patch to only convert html parts to text and leave text parts alone.

Actions #4

Updated by Toshi MARUYAMA over 11 years ago

  • Tracker changed from Patch to Feature
Actions #5

Updated by Toshi MARUYAMA over 11 years ago

  • Has duplicate Feature #16034: Support for HTML in incoming mails added
Actions #7

Updated by Joe Darkless over 11 years ago

I think you are mixing things up. So to shed some light on this problem I'll be verbose. (if there are any misstatements feel free to correct them.) (Oh and I have redmine version 2.4.0)

There are 2 issues concerning HTML formatting from incomming mails.

The first one:

Parsing incoming mails and storing them into database

This process is made in app/models/mail_handler.rb. Around line 400 there is a method called plain_text_body, which (imho) takes care of formatting/stripping email body and is called by receive_imap method (indirectly)

This method really sux and doesn't work as it should at all. There can be 3 scenario email. Plaintext only email, HTML only email, plaintext with html both included in email.

If the plaintext email is sent, this method does its job more or less. Same with both formats included. It just grabs plaintext and its done. And It really had problem with HTML only email. And thats what you stated patch here is for... (though thew me an error that some method of loofa doesn't exist - but I think thats due to my version of redmine) Without the patch the HTML mail is just stored in DB as is, without any striping. Anyway it shoud do the parsing for new lines (or BRs) better.

The second one

Displaying description content on issue page

This bug is relevant mainly to CKEditor. If you have disabled CKEditor for Project the "plain-text" mail created issues are displayed corectly. But if you turn the "WYSIWYG" editor on there are no LINE BREAKS and the text is rendered on one line. Referring to issue 12025 (http://www.redmine.org/issues/12025) there is probbably no processing of plain_text to HTML to display the content properly. And this bug is still not solved by stated patch.

Actions #8

Updated by Felix Schäfer over 11 years ago

Joe, you are right that this patch only treats your stated point 1, I don't think I stated otherwise either.

Your second point is irrelevant to core Redmine currently, as (at least last time I checked) Redmine ever only stores text with markup (textile or markdown) but at no time full html. If you use a rich text editor with your Redmine or if your Redmine stores plain HTML, it comes from a plugin and that would be the concern of the plugin author, not of core Redmine. This second point is thus not relevant here and I will not engage in further discussion about it in this ticket. If you think this is relevant to Redmine, please open another ticket about it.

Actions #9

Updated by Felix Schäfer over 11 years ago

Joe Darkless wrote:

Parsing incoming mails and storing them into database

This process is made in app/models/mail_handler.rb. Around line 400 there is a method called plain_text_body, which (imho) takes care of formatting/stripping email body and is called by receive_imap method (indirectly)

MailHandler#plain_text_body is exactly the method the proposed patch changes.

This method really sux

That's a little more harsh than what I'd say, but yes.

and doesn't work as it should at all.

No, it works as it should: provide a plain-text version of an incoming email's body. You might not agree with the method to achieve that goal, but it does achieve it.

There can be 3 scenario email. Plaintext only email, HTML only email, plaintext with html both included in email.

If the plaintext email is sent, this method does its job more or less. Same with both formats included. It just grabs plaintext and its done. And It really had problem with HTML only email.

If the email has at least one text-only part, then all text parts are kept. If the email has no text part but html parts, then all html parts are kept. Any html part then is stripped of all tags before being returned source:app/models/mail_handler.rb#L431, but this tag-stripping is very basic and lacking.

And thats what you stated patch here is for...

Yes. The patch changes the handling of said html parts and instead of just blindly removing tags and tries to make some more sense of the html markup (still at a very basic level though).

I don't know what in the title or description made you think otherwise.

(though thew me an error that some method of loofa doesn't exist - but I think thats due to my version of redmine)

It shouldn't. Did you run bundle install after applying the patch and restarted your Redmine instance before trying it out? If this doesn't help, could you please provide me with the error you got?

Without the patch the HTML mail is just stored in DB as is, without any striping.

No. Vanilla Redmine doesn't store plain HTML anywhere in the DB, if that is the case for you this is caused by a plugin.

Anyway it shoud do the parsing for new lines (or BRs) better.

That is exactly what this patch addresses: currently the MailHandler will just drop for example <br/>, which means this

This is the first line.<br/>This is the second line.

will currently just become

This is the first line.This is the second line.

whereas with the patch it would become

This is the first line.

This is the second line.

I hope I was able to clear up any misunderstanding you had about this ticket or the attached patch.

Actions #10

Updated by Joe Darkless over 11 years ago

Hello,
thanks for the fast response. I admit that I was a bit harsh, because as someone who learnt ruby during last 3 days by just diggin in the code I was upset. So sorry for that, I didn't mean it.

The reason that I have included reference to the second problem (which is connected to external plugin) was because to fix problem you need to fix both of the issues, and I think that also other shoud know that there is a problem. And I definitely understad that this has nothing to do with redmine core. So as a quickfix for the second stated issue I'm going to redirect others here: https://github.com/a-ono/redmine_ckeditor/issues/89

Ok, and concerning the first issue, which is related to redmine core:
Yes I have run bundle install respectively (rvm 2.0.0 exec bundle install) And it finished sucessfuly with loofa gem installed.
Yes I have restarted httpd server (with passenger)
I have posted the error on pastebin http://pastebin.com/Cp3fCMhM (not sure about pasting politic so please feel free to post it here or include as a file)
I really wasn't able to troubleshoot this error with my 3day knowledge of ruby and didn't find relevant issue on google.

Actions #11

Updated by Felix Schäfer over 11 years ago

Joe Darkless wrote:

thanks for the fast response. I admit that I was a bit harsh, because as someone who learnt ruby during last 3 days by just diggin in the code I was upset. So sorry for that, I didn't mean it.

The Redmine code has a lot of baggage and really shows its age in some parts, so it's probably not the best codebase to get started with Ruby and Rails (sadly). I'd say kudos for keeping at it for 3 days :-)

Ok, and concerning the first issue, which is related to redmine core:
Yes I have run bundle install respectively (rvm 2.0.0 exec bundle install) And it finished sucessfuly with loofa gem installed.
Yes I have restarted httpd server (with passenger)
I have posted the error on pastebin http://pastebin.com/Cp3fCMhM (not sure about pasting politic so please feel free to post it here or include as a file)
I really wasn't able to troubleshoot this error with my 3day knowledge of ruby and didn't find relevant issue on google.

It seems I had developed the patch with Loofah 1.x and that since Loofah 2.0 has been released, which requires a little more legwork. Could you try adding

require "loofah/helpers"

at the top of the patched mail_handler.rb? I don't have the time to test it right now, but that should do it.

Actions #12

Updated by Joe Darkless over 11 years ago

Felix Schäfer wrote:

It seems I had developed the patch with Loofah 1.x and that since Loofah 2.0 has been released, which requires a little more legwork. Could you try adding

[...]

at the top of the patched mail_handler.rb? I don't have the time to test it right now, but that should do it.

Yes that fixed the problem. Thank you very much.

Actions #13

Updated by Joe Darkless over 11 years ago

Excuse me for inconvenience but I don't know how to link issue 'Related to' another record.

This Issue is somehow related to Feature http://www.redmine.org/issues/8462 and in my opinion it is good to know.

Actions #14

Updated by Toshi MARUYAMA over 11 years ago

  • Related to Feature #8462: Save received email as an attachment, not as a comment added
Actions #15

Updated by Yehuda Katz over 11 years ago

Patch 2 works for me (although I had to apply it manually, it did not patch cleanly). I also had to add the require "loofah/helpers" line.

Actions #16

Updated by Toshi MARUYAMA about 11 years ago

  • Related to Defect #12025: rake receive_pop3 is stripping formatting and new lines added
Actions #17

Updated by Felix Schäfer almost 11 years ago

Just a quick note that Rails starting with 4.2.0 will use Loofah internally http://edgeguides.rubyonrails.org/4_2_release_notes.html#html-sanitizer So this will only be an additional dependency until then.

Actions #18

Updated by Toshi MARUYAMA almost 11 years ago

  • Related to Defect #18374: Incoming emails are truncating CRLF added
Actions #19

Updated by Krishna Gollamudi almost 11 years ago

Felix Schäfer wrote:

Just a quick note that Rails starting with 4.2.0 will use Loofah internally http://edgeguides.rubyonrails.org/4_2_release_notes.html#html-sanitizer So this will only be an additional dependency until then.

After applying this patch and reverting the changes, I am getting Gem not found errorrs. Can you please help me? Here below is the complete log.
.0.21/helper-scripts/passenger-spawn-server 102

Actions #20

Updated by Felix Schäfer almost 11 years ago

Make backups before applying patches.

Other than that, I think that running bundle install in your Redmine directory and restarting apache should do the trick.

Actions #21

Updated by Krishna Gollamudi almost 11 years ago

Felix Schäfer wrote:

Make backups before applying patches.

Other than that, I think that running bundle install in your Redmine directory and restarting apache should do the trick.

I did, but still having the issues. The Gem is installed but somehow it is not being detected. Here is my bundle env. Do you find anything missing?

Actions #22

Updated by Felix Schäfer almost 11 years ago

Check if you can run the built-in rails console or server for Redmine. If it works, the problem is with Passenger or your Passenger configuration.

Actions #23

Updated by Felix Schäfer almost 11 years ago

(and this is the wrong place to get help with your installation, as it has nothing to do with the original ticket. For further help I suggest heading to the forums.)

Actions #24

Updated by Krishna Gollamudi almost 11 years ago

Felix:

I am able to identify the root cause and fix it. Has to do with the ruby and gems path but my emails are still loosing the HTML tags while loading into Redmine. Does your above patch works for Redmine 2.5.2 ?

Actions #25

Updated by Jean-Philippe Lang almost 11 years ago

  • Target version changed from Candidate for next minor release to Candidate for next major release

With the patch applied, new lines are inserted as expected but this doesn't solve 2. of #13209: new lines in incoming html should be removed, eg.

<p>This is
a paragraph</p>

should be converted to a single line: "This is a paragraph".

Actions #26

Updated by Jean-Philippe Lang almost 11 years ago

A simple fix could be gsub(/\s+/, ' ') the html before processing it with Loofah.

Actions #27

Updated by Krishna Gollamudi almost 11 years ago

Jean-Philippe Lang wrote:

With the patch applied, new lines are inserted as expected but this doesn't solve 2. of #13209: new lines in incoming html should be removed, eg.

[...]

should be converted to a single line: "This is a paragraph".

Jean, Are you able to retain tables coming from the incoming emails?

Actions #28

Updated by Felix Schäfer almost 11 years ago

Krishna Gollamudi wrote:

I am able to identify the root cause and fix it. Has to do with the ruby and gems path but my emails are still loosing the HTML tags while loading into Redmine. Does your above patch works for Redmine 2.5.2 ?

That's the point: before the patch, if the incoming email has no text-part, the html-part is taken, but with all HTML tags removed. With the patch, in the case of no text-part, the html-part is taken, still with all HTML tags removed, but with certain tags replaced with newlines to keep paragraphs and alike.

Actions #29

Updated by Felix Schäfer almost 11 years ago

Jean-Philippe Lang wrote:

With the patch applied, new lines are inserted as expected but this doesn't solve 2. of #13209: new lines in incoming html should be removed, eg.

[...]

should be converted to a single line: "This is a paragraph".

I had not considered this case. I'll see if there is an option in loofah so that a wholesale replacing of all whitespaces in the whole document is not needed.

Actions #30

Updated by Felix Schäfer almost 11 years ago

Krishna Gollamudi wrote:

Jean, Are you able to retain tables coming from the incoming emails?

If you mean html tables: no, this method transforms the html-part of a text-part-less email to a somewhat equivalent text, but does not know about textile or markdown markup that might be used to render that text.

Actions #31

Updated by Krishna Gollamudi almost 11 years ago

Felix Schäfer wrote:

Krishna Gollamudi wrote:

Jean, Are you able to retain tables coming from the incoming emails?

If you mean html tables: no, this method transforms the html-part of a text-part-less email to a somewhat equivalent text, but does not know about textile or markdown markup that might be used to render that text.

Felix, I mean if the incoming email has data like the attached, then the issue is getting created just with the text in separate line.

Actions #32

Updated by Felix Schäfer almost 11 years ago

Krishna Gollamudi wrote:

Felix, I mean if the incoming email has data like the attached, then the issue is getting created just with the text in separate line.

I guess that would be the case, but I'd have to test that. Could you either attach to this issue or send me (my email address should be on my user page) if you'd rather not have it accessible here the source of an email (in .eml format) with such a table so that I can test it?

Actions #33

Updated by Krishna Gollamudi almost 11 years ago

Felix Schäfer wrote:

Krishna Gollamudi wrote:

Felix, I mean if the incoming email has data like the attached, then the issue is getting created just with the text in separate line.

I guess that would be the case, but I'd have to test that. Could you either attach to this issue or send me (my email address should be on my user page) if you'd rather not have it accessible here the source of an email (in .eml format) with such a table so that I can test it?

No problem Felix. Here is the email in .msg format straight from my outlook account.

Actions #34

Updated by Krishna Gollamudi almost 11 years ago

Got any update Felix??

Actions #35

Updated by Felix Schäfer almost 11 years ago

Krishna, I had first to transform your .msg file to .eml format, I'm not sure how good the converters are, but I was able to see the same content as is visible in your screenshot.

The result from the .eml file is:

\n\n \n\nTeam project: \n\nTe\n\nArea: \n\n\\\\Te\\\\OH 30\n\nIteration: \n\n\\\\Te\\\\OH 30\\\\Iteration 1\n\nAssigned to: \n\nKP\n\nState: \n\nNew\n\nReason: \n\nNew\n\nChanged by: \n\nGo\n\nChanged date: \n\n12/19/2014 3:21:02 PM\n\nChanged fields \n\nField \n\nNew Value \n\nHistory\n\nPer triage QA confirmed this is happening when user trying to see history.\n\n \n\nField \n\nNew Value \n\nOld Value \n\nTriage\n\nNeed More Info - Dev\n\nNeed More Info - QA\n\nAssigned To\n\nKP\n\nGo\n\n \n\n

which basically means that every table cell gets thrown on its own line. The first few lines would look like:

START

Team project:

Te

Area:

\\Te\\OH 30

Iteration:

\\Te\\OH 30\\Iteration 1

Assigned to:

KP
END

With the current method that would be like:

START
 Team project: TeArea: \\Te\\OH 30Iteration: \\Te\\OH 30\\Iteration 1Assigned to: KP
END

Please note that the converted .eml had a text part, so the patch proposed here would not change how this email is imported to Redmine, as the text part has precedence over the html part in any case.

Actions #36

Updated by Felix Schäfer almost 11 years ago

Jean-Philippe Lang wrote:

A simple fix could be gsub(/\s+/, ' ') the html before processing it with Loofah.

I though about it a bit more and that would also mangle whitespace in places where it is significant in html, for example in < pre> blocks. For whitespace that wouldn't be a problem as it would be just text in textile, where any number of whitespace will just get collapse in the rendered html afterwards, but < pre>foo\nbar\nbaz would then be changed to < pre>foo bar baz and converted to \nfoo bar baz\n instead of \nfoo\nbar\nbaz\n.

I have looked for a nicer solution with loofah or nokogiri but couldn't find one yet. I have asked on the loofah tracker for advice on this matter.

Actions #37

Updated by Toshi MARUYAMA over 10 years ago

  • Related to Defect #19737: HTML Sanitizer not working for Outlook mails added
Actions #38

Updated by Toshi MARUYAMA over 10 years ago

  • Related to Defect #19740: "Truncate emails after one of these lines" setting is not working added
Actions #39

Updated by Jean-Philippe Lang over 10 years ago

  • Status changed from Needs feedback to Closed
  • Assignee set to Jean-Philippe Lang
  • Target version changed from Candidate for next major release to 3.1.0
  • Resolution set to Fixed

Loofah is now used for processing HTML-only emails (r14313). HTML processing was moved from the MailHandler to the text formatter so we can convert it to wiki syntax.

Actions #40

Updated by Deoren Moor over 10 years ago

Jean-Philippe Lang wrote:

Loofah is now used for processing HTML-only emails (r14313). HTML processing was moved from the MailHandler to the text formatter so we can convert it to wiki syntax.

Should the changes have also solved processing issues with Apple Mail emails (#15716) from Mac OS X and iOS?

Actions #41

Updated by Go MAEDA over 6 years ago

  • Related to Defect #10201: MailHandler ignores line feeds when handling html-only emails added
Actions #42

Updated by Go MAEDA over 6 years ago

  • Related to Defect #13284: Should not try to strip HTML tags from text/plain MIME part added
Actions #43

Updated by Go MAEDA over 6 years ago

  • Related to Defect #6705: Incoming emails from Apple Mail not parsed correctly added
Actions #44

Updated by Go MAEDA over 6 years ago

Actions #45

Updated by Go MAEDA over 6 years ago

  • Related to Defect #14668: Incoming emails has wrong line breaks added
Actions #46

Updated by Go MAEDA over 6 years ago

  • Related to Feature #13844: Option to enable HTML emails for issue creation. added
Actions

Also available in: Atom PDF