Defect #11089

UTF-8 encoding not showing correctly when looking highlighted file contents

Added by Troex Nevelin 12 months ago. Updated about 1 month ago.

Status:ConfirmedStart date:
Priority:NormalDue date:
Assignee:-% Done:

0%

Category:Text formatting
Target version:Candidate for next minor release
Affected version:2.0.1 Resolution:

Description

Ruby version              1.9.3 (x86_64-linux)
RubyGems version          1.8.11
Rack version              1.4
Rails version             3.2.5
Database adapter          mysql2
Database schema version   20120422150750
Git version               1.7.2.5

When I request file to see it contents (repository/revisions/HASH/entry) instead UTF-8 text I get '???'.
I'm using Git SCM and my files are valid UTF-8 (without BOM). I have this problem with Chineses, Russian, Thai and other scripts than latin.
However seeing diff's and attached utf-8 files are okay.

utf-8-not-shown-in-file-contents-view.png (50.4 KB) Troex Nevelin, 2012-06-04 19:13

diff-view-is-okay.png (56.6 KB) Troex Nevelin, 2012-06-04 19:13

gh-new-d7e2a66d.png (53.8 KB) Toshi MARUYAMA, 2012-06-05 15:13

git-show-component.php.txt Magnifier (2.61 KB) Troex Nevelin, 2012-06-05 15:36

git-show-component.php Magnifier - the same file but with different extension (2.61 KB) Troex Nevelin, 2012-06-05 15:39

issue-attached-files.png (15.2 KB) Troex Nevelin, 2012-06-05 15:45

test-file-with-php-ext.png (145 KB) Troex Nevelin, 2012-06-05 15:45

test-file-with-txt-ext.png (93.5 KB) Troex Nevelin, 2012-06-05 15:45

def.php Magnifier (31 Bytes) gehao liu, 2012-06-11 11:09

def.py Magnifier (38 Bytes) gehao liu, 2012-06-11 11:09

def.txt Magnifier (31 Bytes) gehao liu, 2012-06-11 11:09


Related issues

Duplicated by Defect #11131: repository View and Annotate code Utf-8 show ??? ,diff is... Closed

History

#1 Updated by Etienne Massip 12 months ago

Did you set any value in the Attachments and repositories encodings setting (in Administration/Settings General tab)?

If not, try to?

#2 Updated by Troex Nevelin 12 months ago

Yes I have tried setting it to UTF-8 but it has no effect

#3 Updated by Toshi MARUYAMA 12 months ago

  • Subject changed from UTF-8 encoding not showing correctly when looking file contents to Git: encoding not showing correctly when looking file contents

#4 Updated by Toshi MARUYAMA 12 months ago

Redmine uses "git show".
source:tags/2.0.1/lib/redmine/scm/adapters/git_adapter.rb#L372

Git 1.7.3.4, "git show --help" says

The contents of the blob objects are uninterpreted sequences of bytes. 
There is no encoding translation at the core level.

#5 Updated by Troex Nevelin 12 months ago

I understand that git stores files in binary form, but calling from console:

git show --no-color HEAD:.../lang/ru/component.php

returns UTF-8 valid text, as I understand Redmine tries to guess encoding and sanitise content making sure no invalid characters pass to view.

For example source:trunk/config/locales/ja.yml this displays up correctly (but it uses SVN).

I think there is encoding guess problem in source:tags/2.0.1/lib/redmine/codeset_util.rb#L84 calling .to_utf8_by_setting_internal(str) sets ASCII-8BIT encoding on line 94?

#6 Updated by Toshi MARUYAMA 12 months ago

I cannot reproduce.
https://github.com/redmine/redmine/commit/d7e2a66d

Could you attach this "git show" output file?

#7 Updated by Troex Nevelin 12 months ago

git show --no-color HEAD:.../lang/ru/component.php > git-show-component.php.txt

I'm running Redmine on Debian 6, with ruby 1.9.3p125 (2012-02-16) [x86_64-linux] package compiled from debian ruby repository, using unicorn rack server.

I'm almost sure this is local related problem. Can you guide me how to debug this problem? I'm familier with ruby and ror. I have tried to output raw content in app/views/common/_file.html.erb but it gives me ActionView::Template::Error (incompatible character encodings: UTF-8 and ASCII-8BIT) error

#8 Updated by Troex Nevelin 12 months ago

#9 Updated by Troex Nevelin 12 months ago

I've made one more test on my setup, I've attached the same file to an issue but with different extensions .txt and .php and when trying to see attached file I get an issue with viewing syntax highlighted file. So this is not only Git related problem.

But no issue here in this ticket.

# grep coderay Gemfile.lock 
    coderay (1.0.6)
  coderay (~> 1.0.6)

#10 Updated by Toshi MARUYAMA 12 months ago

  • Subject changed from Git: encoding not showing correctly when looking file contents to UTF-8 encoding not showing correctly when looking file contents
  • Category deleted (SCM)

#11 Updated by gehao liu 11 months ago

  • Status changed from New to Resolved

same problem!!!!!!!!!!
txt extname is OK,
python extname .py is OK.
php extname is wrong,
problem is viewing syntax highlighted!!!!!!!!!!

#12 Updated by gehao liu 11 months ago

gehao liu wrote:

same problem!!!!!!!!!!
txt extname is OK,
python extname .py is OK.
php extname is wrong,
problem is viewing syntax highlighted!!!!!!!!!!

#13 Updated by Toshi MARUYAMA 11 months ago

  • Status changed from Resolved to New

#14 Updated by gehao liu 11 months ago

this is coderay 1.0.6's bug,only php file.

#15 Updated by András Kolesár about 1 month ago

coderay php encoding issue has been solved:
https://github.com/rubychan/coderay/issues/40

checked, works fine with updated coderay/scanners/php.rb file

#16 Updated by Etienne Massip about 1 month ago

  • Subject changed from UTF-8 encoding not showing correctly when looking file contents to UTF-8 encoding not showing correctly when looking highlighted file contents
  • Category set to Text formatting
  • Status changed from New to Confirmed
  • Target version set to Candidate for next minor release

Upgrade dep to 1.0.9 or 1.1.

Also available in: Atom PDF