Defect #12641: Diff outputs become ??? in some non ASCII words. - Redmine

Actions

Copy link

Defect #12641

closed

Diff outputs become ??? in some non ASCII words.

Added by Toshi MARUYAMA over 11 years ago. Updated about 11 years ago.

Status:

Closed

Priority:

Normal

Assignee:

Toshi MARUYAMA

Category:

I18n

Target version:

2.3.0

Start date:

Due date:

% Done:

Estimated time:

Resolution:

Fixed

Affected version:

2.1.4

Description

An example is r11052 in #12640#note-2.

Files

Download all files

diff-r11052.png (20 KB) diff-r11052.png		Toshi MARUYAMA, 2012-12-19 09:30
unified_diff.rb.diff (787 Bytes) unified_diff.rb.diff	Correct UTF-8 parsing	Filou Centrinov, 2013-03-05 00:16
unified_diff.rb.2.diff (621 Bytes) unified_diff.rb.2.diff	Set utf-8 encoding	Filou Centrinov, 2013-03-05 12:54

Related issues

Actions

Copy link

Updated by Filou Centrinov about 11 years ago

File unified_diff.rb.diff unified_diff.rb.diff added

The Problem is, that for example the following diff-lines

- часа" 
+ часов"

are parsed in Redmine as UTF-8 like this:

\xD1\x87\xD0\xB0\xD1\x81\xD0<span>\xB0</span>&quot;
\xD1\x87\xD0\xB0\xD1\x81\xD0<span>\xBE\xD0\xB2</span>&quot;

This is wrong, because the leading byte \xD0 is part of the cyrillic 2-Byte character "а" in the -tag, but it's actually outside of the -tag. Therefore charaters will be misinterpreted and will be displayed with "?".

Correct UTF-8 would be:

\xD1\x87\xD0\xB0\xD1\x81<span>\xD0\xB0</span>&quot;
\xD1\x87\xD0\xB0\xD1\x81<span>\xD0\xBE\xD0\xB2</span>&quot;

So we have for the first line "...\xD0\xB0..." instead of "...\xD0\xB0...". The attached patch searchs for the last leading byte, if the unmatching byte is a continuation byte (and not a leading byte or a single character byte).

A continuation byte has the binary format 10xxxxxx, so we can determine it with myContinuationByte.ord.between?(128, 191)

This problem occurs always, when the first determined difference between two bytes are continuation bytes. An other example in japanese you find in #13350.

Actions

Copy link