Feature #19289

Exclude attachments from incoming emails based on file content or file hash

Added by Mikhail Voronyuk almost 3 years ago. Updated almost 2 years ago.

Status:NewStart date:
Priority:NormalDue date:
Assignee:-% Done:

0%

Category:Email receiving
Target version:-
Resolution:

Description

We have a problem similar to #3413, i.e. if we create tickets via email there a lot of signature images will get Redmine attachments.
But the difference is that in our company we use IBM Notes thick and web-clients so
  1. the signature images and the inline images (e.g. user screenshots that Printscreen and Ctrl+V into email body) in each new email have different names
  2. there is no way to distinguish the inline images from the signature images by name/size (sometimes useful user screenshot may be less that signature image in size) or other tags in email body
  3. a similar image that is presented in forwarded email several times has multiple names but similar size and similar binary

Here they are:
!2015-03-06 10-37-50.png!
!2015-03-06 10-41-03.png!

I know that there is a way in Redmine to filter attachments by file mask. As you can see it is useless in this case.
Also we want to leave useful inline images because we can not ask users to do not paste the printscreens into the email body directly and to save at first the screenshot as a file and attach screenshot as file (in that case for the user it is simpler to do not report a bug than to do this:) ). So the patch mail_handler_ignore_inline_attachments_patch in #3413 useless for us too.

I thought to create a list of ignored attachments (e.g. directory in redmine server with all possible signature images files or file hashes list) and write a patch that will compare binary or hash of each new email attachment with the ignored list. But I'm a newbie in Ruby and I will appreciate any help.

I guess that the patch should be in the app/models/mail_handler.rb in accept_attachment? subroutine and should use FileUtils.cmp for comparing. But I have no idea what contained in attachment.decoded and how do I compare it to master copy.

2015-03-06 10-41-03.png - using IBM Notes thick client (35.6 KB) Mikhail Voronyuk, 2015-03-06 08:49

2015-03-06 10-37-50.png - using IBM iNotes web client (36.5 KB) Mikhail Voronyuk, 2015-03-06 08:49

0001-Filter-email-attachments-based-on-content-ignore-fil.patch Magnifier (1.3 KB) Mikhail Voronyuk, 2015-03-08 09:09


Related issues

Related to Redmine - Patch #25215: Re-use existing identical disk files for new attachments Closed

History

#1 Updated by Mikhail Voronyuk almost 3 years ago

Google and Stackoverflow help to write a patch =)

The additional code compares binary of each new email attachment with the ignored list (directory in a Redmine server with all possible files to be ignored).

May be someone will find the patch helpful.

#2 Updated by Toshi MARUYAMA over 2 years ago

  • Status changed from Resolved to New

In your patch, '/home/redmine/redmine-ignored-attachments/*' is hard-coded.
And saving and comparing files anytime is very expensive.
I think it is better to use hash (e.g. md5sum).
Redmine uses md5sum for attachments.
source:tags/3.0.1/app/models/attachment.rb#L108

#3 Updated by Mikhail Voronyuk over 2 years ago

Toshi MARUYAMA wrote:

In your patch, '/home/redmine/redmine-ignored-attachments/*' is hard-coded.

Where do you propose to save the files to be ignored or theirs hashes? I thought about the Files section but there is no ability to separate useful files from the files to be ignored except using separate project for that or using special filename or description e.g. "ignored".

And saving and comparing files anytime is very expensive.
I think it is better to use hash (e.g. md5sum).

I agree that using hash would be better.

#4 Updated by Toshi MARUYAMA over 2 years ago

Path should be configurable such as "attachments_storage_path".
source:tags/3.0.3/config/configuration.yml.example#L69

#5 Updated by Manuel Mai over 2 years ago

This code works perfectly for me in Redmine 3.0 on Windows.
I implemented the MD5-check.

require 'digest/md5'
ignoreddir = "C:\\redmine\\redmine-ignored-attachments\\"
md5_attachment = Digest::MD5.hexdigest(attachment.body.decoded)
Dir.foreach(ignoreddir) do |ignoredf|
next if ignoredf '.' or ignoredf '..'
md5_ignored = Digest::MD5.file(File.join(ignoreddir, ignoredf)).hexdigest
if md5_ignored == md5_attachment
logger.info "MailHandler: ignoring attachment #{attachment.filename} (#{md5_attachment}) matching #{ignoredf} (#{md5_attachment})"
return false
end
end

To do:
- Configurable path in configuration.yml
- Cache MD5 hashes in database to avoid high load on hard drive because of hashing every file every time an email comes in

#6 Updated by Jos Groot Lipman over 2 years ago

For a very easy optimization you might first compare the size of the files. If the sizes differ there is no need to calculate the MD5 of the ignored file.

#7 Updated by Sebastian Paluch almost 2 years ago

same painful problem here :'(

just eliminating duplicated attachments (#15257) would also help.

#8 Updated by Toshi MARUYAMA almost 2 years ago

#9 Updated by Go MAEDA 10 months ago

#10 Updated by Go MAEDA 10 months ago

  • Related to Patch #25215: Re-use existing identical disk files for new attachments added

Also available in: Atom PDF