Feature #2371

character encoding for attachment file

Added by youngseok yi over 5 years ago. Updated almost 3 years ago.

Status:ClosedStart date:2008-12-22
Priority:LowDue date:
Assignee:Toshi MARUYAMA% Done:

100%

Category:Attachments
Target version:1.3.0
Resolution:Fixed

Description

As r814, default encoding for repository can be configured.

diff or patch attachment requires similar configuration.

I thinks 2nd option may be enough and useful.

attachment-encoding.patch Magnifier (1017 Bytes) Yuya Nishihara, 2010-06-27 12:14

general-settings.png (28.2 KB) Toshi MARUYAMA, 2011-11-17 11:17


Related issues

Related to Defect #9143: Partial diff comparison should be done on actual code, no... Closed 2011-08-29
Related to Defect #4608: Mail attachment name encoding is incorectly handled New 2010-01-19
Duplicated by Feature #4577: convert text file attached an issue to utf-8. Closed 2010-01-14
Duplicated by Defect #3652: Unicode Support for TXT-Files Closed 2009-07-22

Associated revisions

Revision 7823
Added by Toshi MARUYAMA almost 3 years ago

attachment: add a functional test to show UTF-8 text file (#2371)

Revision 7824
Added by Toshi MARUYAMA almost 3 years ago

attachment: add a functional test to show invalid UTF-8 text file (#2371)

Stripping invalid UTF-8 is Redmine 1.2 behaviour.

Revision 7825
Added by Toshi MARUYAMA almost 3 years ago

move repositories helper to_utf8 logic to lib/redmine/codeset_util.rb for common use (#2371)

Revision 7828
Added by Toshi MARUYAMA almost 3 years ago

move repositories helper to_utf8 tests to new CodesetUtilTest (#2371)

Revision 7866
Added by Toshi MARUYAMA almost 3 years ago

attachment: use repositories setting to convert contents character encoding (#2371)

This commit results replacing invalid encoding instead to stripping.

Revision 7867
Added by Toshi MARUYAMA almost 3 years ago

attachment: add a functional test to show an ISO-8859-1 patch (#2371)

Revision 7868
Added by Toshi MARUYAMA almost 3 years ago

attachment: add a functional test to show an ISO-8859-1 content file (#2371)

Revision 7869
Added by Toshi MARUYAMA almost 3 years ago

attachment: move repositories encodings setting to the general tab and update the label (#2371)

Revision 7870
Added by Toshi MARUYAMA almost 3 years ago

update Japanese translation of attachments and repositories encodings setting label (#2371)

Revision 7871
Added by Toshi MARUYAMA almost 3 years ago

scm: attachment: remove "to_utf8" methods from helpers (#2371)

It is confusing that same name methods are in several helpers.

Revision 7873
Added by Toshi MARUYAMA almost 3 years ago

attachment: update a functional test to switch "side by side" and "inline" for ISO-8859-1 patches (#2371, #9612)

Revision 7885
Added by Toshi MARUYAMA almost 3 years ago

attachment: add missing diff type at functional tests (#2371, #9612)

History

#1 Updated by Yuya Nishihara about 4 years ago

youngseok yi wrote:

  • follow encoding of repository.

Attached patch implements it with minimal changes. attachment-encoding.patch

Proper solution will be something like:
  1. move to_utf8 to separate module, e.g. RepoFilesHelper
  2. make AttachmentsHelper and RepositoriesHelper include RepoFilesHelper

#2 Updated by Toshi MARUYAMA over 3 years ago

  • Assignee set to Toshi MARUYAMA

#3 Updated by Toshi MARUYAMA over 3 years ago

  • Target version set to Candidate for next major release

#4 Updated by Toshi MARUYAMA about 3 years ago

  • Target version changed from Candidate for next major release to 1.3.0

#5 Updated by Toshi MARUYAMA almost 3 years ago

  • Subject changed from encoding for diff or patch attachment file to encoding for attachment file

#6 Updated by Toshi MARUYAMA almost 3 years ago

  • Subject changed from encoding for attachment file to character encoding for attachment file

#7 Updated by Etienne Massip almost 3 years ago

Toshi, won't your last commit prevent me from attaching an iso8859-1 encoded patch to this issue and seeing it fine?

#8 Updated by Toshi MARUYAMA almost 3 years ago

Etienne Massip wrote:

Toshi, won't your last commit prevent me from attaching an iso8859-1 encoded patch to this issue and seeing it fine?

This feature issue goal is that attachment file and patch encoding are converted by repositories setting.

#9 Updated by Etienne Massip almost 3 years ago

I'm not sure this is a good idea; repositories may return data using a specific encoding, but attachments are usually stored on FS without transformation, so assuming that they're "very likely to be encoded the same way data in SCM is" is not necessarily true.

For example, my encoding list starts with UTF-8 and my locale (Fr) would assume that files uploaded by users are probably encoded in ISO-8859-15/CP1252; so assuming that the text files uploaded are in UTF-8 mean that they will be rendered stripped and that I will probably often loose some chars, which is the actual situation.

I would prefer to be able to specify a distinct default encoding for text attachments which would be ISO-8859-15/CP1252 (could be defaulted to default server encoding) and render with something like bom_present?(str) ? str : Iconv.conv('UTF-8', Setting.default_encoding).

#10 Updated by Toshi MARUYAMA almost 3 years ago

UTF-8 is very strict.
It is very rare case that miss understanding ISO-8859-1 characters as UTF-8.
http://groups.google.com/group/thg-dev/browse_thread/thread/6c258628e3fce8/09e9dbe4a030e51d

#11 Updated by Toshi MARUYAMA almost 3 years ago

Redmine 1.2.2 repository converting encoding is this line.
source:tags/1.2.2/app/helpers/repositories_helper.rb#L140

In case of "UTF-8,ISO-8859-1",
if converting error in "UTF-8", Redmine converts from ISO-8859-1.

Japanese use three encoding, UTF-8, EUC-JP and Shift-JIS (CP932).
This Redmine feature is big advantage in Japan.

#12 Updated by Etienne Massip almost 3 years ago

So if I understand well, according to encoding list order, it will try and fail to convert the ISO-8859-1 file from UTF-8 to UTF-8 and then will try and success to convert it from ISO-8859-1 to UTF-8?

Guess it will work...

#13 Updated by Etienne Massip almost 3 years ago

What if the administrator does not set UTF-8 at the start of the list?
Can't you str.is_utf8? ? str : try Iconv.conv('UTF-8', Setting.encodings)?

#14 Updated by Toshi MARUYAMA almost 3 years ago

Etienne Massip wrote:

repositories may return data using a specific encoding,

It is not true.
SCMs does not have encoding information (meta data) of file contents.
http://mercurial.selenic.com/wiki/EncodingStrategy?action=recall&rev=21#Unknown_byte_strings

#15 Updated by Etienne Massip almost 3 years ago

Toshi MARUYAMA wrote:

It is not true.
SCMs does not have encoding information (meta data) of file contents.

Well, that's why I said may :-)

#16 Updated by Toshi MARUYAMA almost 3 years ago

Etienne Massip wrote:

What if the administrator does not set UTF-8 at the start of the list?

This is very rare case in Japan.
It is popular "UTF-8,EUC-JP,Shift_JIS in Japan.
This order is strict order.

If Single Byte Character Set (e.g. ISO-8859-1) is the start of the list, all characters are converted to UTF-8.
But, I think this is very rare case in the whole world.

Can't you str.is_utf8? ? str : try Iconv.conv('UTF-8', Setting.encodings)?

Default repository encoding setting is empty.
This is equivalent that default is UTF-8.
And I think it is better that administrator set UTF-8 in the start of the list explicitly.

#17 Updated by Toshi MARUYAMA almost 3 years ago

  • % Done changed from 0 to 100

#18 Updated by Anton Statutov almost 3 years ago

Is this feature fixes #4608?

#19 Updated by Mischa The Evil almost 3 years ago

Anton Statutov wrote:

Is this feature fixes #4608?

I don't think so.

#20 Updated by Toshi MARUYAMA almost 3 years ago

  • Status changed from New to Closed
  • Resolution set to Fixed

Committed in r7885.

Also available in: Atom PDF