Defect #12641
Diff outputs become ??? in some non ASCII words.
Status: | Closed | Start date: | ||
---|---|---|---|---|
Priority: | Normal | Due date: | ||
Assignee: | % Done: | 0% | ||
Category: | I18n | |||
Target version: | 2.3.0 | |||
Resolution: | Fixed | Affected version: | 2.1.4 |
Related issues
Associated revisions
remove unnecessary h() from diff filename (#12641)
On Rails3, escaping is default.
move utf8 encoding from view to UnifiedDiff (#12641)
code cleanup (#12641)
set html encoding utf8 at Diff class (#12641)
fix that diff outputs become ??? in some non ASCII words (#12641)
Contributed by Filou Centrinov.
svn propset svn:eol-style native to fixtures (#12641)
2.3-stable: svn propset svn:eol-style native to fixtures (#12641)
History
#1
Updated by Filou Centrinov about 9 years ago
- File unified_diff.rb.diff
added
The Problem is, that for example the following diff-lines
- часа"
+ часов"
are parsed in Redmine as UTF-8 like this:
\xD1\x87\xD0\xB0\xD1\x81\xD0<span>\xB0</span>"
\xD1\x87\xD0\xB0\xD1\x81\xD0<span>\xBE\xD0\xB2</span>"
This is wrong, because the leading byte \xD0
is part of the cyrillic 2-Byte character "а
" in the <span>-tag, but it's actually outside of the <span>-tag. Therefore charaters will be misinterpreted and will be displayed with "?".
Correct UTF-8 would be:
\xD1\x87\xD0\xB0\xD1\x81<span>\xD0\xB0</span>"
\xD1\x87\xD0\xB0\xD1\x81<span>\xD0\xBE\xD0\xB2</span>"
So we have for the first line "...<span>\xD0\xB0</span>...
" instead of "...\xD0<span>\xB0</span>...
". The attached patch searchs for the last leading byte, if the unmatching byte is a continuation byte (and not a leading byte or a single character byte).
A continuation byte has the binary format 10xxxxxx, so we can determine it with myContinuationByte.ord.between?(128, 191)
This problem occurs always, when the first determined difference between two bytes are continuation bytes. An other example in japanese you find in #13350.
#2
Updated by Filou Centrinov about 9 years ago
- File unified_diff.rb.2.diff
added
A much better way to fix this problem is to set an UTF-8 encoding. :-)
#3
Updated by Filou Centrinov about 9 years ago
The affected version is also 2.3 (devel)
#4
Updated by Toshi MARUYAMA about 9 years ago
- Category set to I18n
- Assignee set to Toshi MARUYAMA
- Target version set to 2.4.0
#5
Updated by Toshi MARUYAMA about 9 years ago
- Target version changed from 2.4.0 to 2.3.0
#6
Updated by Toshi MARUYAMA about 9 years ago
- Status changed from New to Closed
- Resolution set to Fixed
Committed in, thanks.