Feature #2371
character encoding for attachment file
| Status: | Closed | Start date: | 2008-12-22 | |
|---|---|---|---|---|
| Priority: | Low | Due date: | ||
| Assignee: | Toshi MARUYAMA | % Done: | 100% |
|
| Category: | Attachments | |||
| Target version: | 1.3.0 | |||
| Resolution: | Fixed |
Description
As r814, default encoding for repository can be configured.
diff or patch attachment requires similar configuration.
- default encoding for diff or patch attachment (Admin -> Settings -> Attachment -> diff/patch encodings ?).
- follow encoding of repository. (source:/trunk/app/helpers/repositories_helper.rb@1900#L109)
I thinks 2nd option may be enough and useful.
Related issues
Associated revisions
attachment: add a functional test to show UTF-8 text file (#2371)
attachment: add a functional test to show invalid UTF-8 text file (#2371)
Stripping invalid UTF-8 is Redmine 1.2 behaviour.
move repositories helper to_utf8 logic to lib/redmine/codeset_util.rb for common use (#2371)
move repositories helper to_utf8 tests to new CodesetUtilTest (#2371)
attachment: use repositories setting to convert contents character encoding (#2371)
This commit results replacing invalid encoding instead to stripping.
attachment: add a functional test to show an ISO-8859-1 patch (#2371)
attachment: add a functional test to show an ISO-8859-1 content file (#2371)
attachment: move repositories encodings setting to the general tab and update the label (#2371)
update Japanese translation of attachments and repositories encodings setting label (#2371)
scm: attachment: remove "to_utf8" methods from helpers (#2371)
It is confusing that same name methods are in several helpers.
History
#1 Updated by Yuya Nishihara almost 2 years ago
- File attachment-encoding.patch added
youngseok yi wrote:
- follow encoding of repository.
Attached patch implements it with minimal changes. attachment-encoding.patch
Proper solution will be something like:- move
to_utf8to separate module, e.g.RepoFilesHelper - make
AttachmentsHelperandRepositoriesHelperinclude RepoFilesHelper
#2 Updated by Toshi MARUYAMA 12 months ago
- Assignee set to Toshi MARUYAMA
#3 Updated by Toshi MARUYAMA 12 months ago
- Target version set to Candidate for next major release
#4 Updated by Toshi MARUYAMA 12 months ago
- Target version changed from Candidate for next major release to 1.3.0
#5 Updated by Toshi MARUYAMA 6 months ago
- Subject changed from encoding for diff or patch attachment file to encoding for attachment file
#6 Updated by Toshi MARUYAMA 6 months ago
- Subject changed from encoding for attachment file to character encoding for attachment file
#7 Updated by Etienne Massip 6 months ago
Toshi, won't your last commit prevent me from attaching an iso8859-1 encoded patch to this issue and seeing it fine?
#8 Updated by Toshi MARUYAMA 6 months ago
- File general-settings.png added
Etienne Massip wrote:
Toshi, won't your last commit prevent me from attaching an iso8859-1 encoded patch to this issue and seeing it fine?
This feature issue goal is that attachment file and patch encoding are converted by repositories setting.
#9 Updated by Etienne Massip 6 months ago
I'm not sure this is a good idea; repositories may return data using a specific encoding, but attachments are usually stored on FS without transformation, so assuming that they're "very likely to be encoded the same way data in SCM is" is not necessarily true.
For example, my encoding list starts with UTF-8 and my locale (Fr) would assume that files uploaded by users are probably encoded in ISO-8859-15/CP1252; so assuming that the text files uploaded are in UTF-8 mean that they will be rendered stripped and that I will probably often loose some chars, which is the actual situation.
I would prefer to be able to specify a distinct default encoding for text attachments which would be ISO-8859-15/CP1252 (could be defaulted to default server encoding) and render with something like bom_present?(str) ? str : Iconv.conv('UTF-8', Setting.default_encoding).
#10 Updated by Toshi MARUYAMA 6 months ago
UTF-8 is very strict.
It is very rare case that miss understanding ISO-8859-1 characters as UTF-8.
http://groups.google.com/group/thg-dev/browse_thread/thread/6c258628e3fce8/09e9dbe4a030e51d
#11 Updated by Toshi MARUYAMA 6 months ago
Redmine 1.2.2 repository converting encoding is this line.
source:tags/1.2.2/app/helpers/repositories_helper.rb#L140
In case of "UTF-8,ISO-8859-1",
if converting error in "UTF-8", Redmine converts from ISO-8859-1.
Japanese use three encoding, UTF-8, EUC-JP and Shift-JIS (CP932).
This Redmine feature is big advantage in Japan.
#12 Updated by Etienne Massip 6 months ago
So if I understand well, according to encoding list order, it will try and fail to convert the ISO-8859-1 file from UTF-8 to UTF-8 and then will try and success to convert it from ISO-8859-1 to UTF-8?
Guess it will work...
#13 Updated by Etienne Massip 6 months ago
What if the administrator does not set UTF-8 at the start of the list?
Can't you str.is_utf8? ? str : try Iconv.conv('UTF-8', Setting.encodings)?
#14 Updated by Toshi MARUYAMA 6 months ago
Etienne Massip wrote:
repositories may return data using a specific encoding,
It is not true.
SCMs does not have encoding information (meta data) of file contents.
http://mercurial.selenic.com/wiki/EncodingStrategy?action=recall&rev=21#Unknown_byte_strings
#15 Updated by Etienne Massip 6 months ago
Toshi MARUYAMA wrote:
It is not true.
SCMs does not have encoding information (meta data) of file contents.
Well, that's why I said may :-)
#16 Updated by Toshi MARUYAMA 6 months ago
Etienne Massip wrote:
What if the administrator does not set UTF-8 at the start of the list?
This is very rare case in Japan.
It is popular "UTF-8,EUC-JP,Shift_JIS in Japan.
This order is strict order.
If Single Byte Character Set (e.g. ISO-8859-1) is the start of the list, all characters are converted to UTF-8.
But, I think this is very rare case in the whole world.
Can't you
str.is_utf8? ? str : try Iconv.conv('UTF-8', Setting.encodings)?
Default repository encoding setting is empty.
This is equivalent that default is UTF-8.
And I think it is better that administrator set UTF-8 in the start of the list explicitly.
#17 Updated by Toshi MARUYAMA 6 months ago
- % Done changed from 0 to 100
#18 Updated by Anton Statutov 6 months ago
Is this feature fixes #4608?
#19 Updated by Mischa The Evil 6 months ago
#20 Updated by Toshi MARUYAMA 6 months ago
- Status changed from New to Closed
- Resolution set to Fixed
Committed in r7885.
