Defect #2274

Filesystem Repository path encoding of non UTF-8 characters

Added by Toni Kerschbaum almost 9 years ago. Updated over 6 years ago.

Status:ClosedStart date:2008-12-04
Priority:NormalDue date:
Assignee:Toshi MARUYAMA% Done:

100%

Category:SCM
Target version:1.2.0
Resolution:Fixed Affected version:

Description

The filesystem repository has some issues regarding special characters in filenames or directory names.
If the name if a file or directory contains a special char (like ü, ö, ä for example), the codepage will get mixed up
and the browser will having trouble displaying it (see attached pictures). If one corrects the codepage manually,
the files in the repo will be displayed correctly, but every other special char (table headers, etc.) will get messed up.

In addition to that, if a directoy contains a special char, it is browseable, but every subdir and file in that directory is not.
That is mainly because every '/' after the special char will be converted to %2F and every special char will be converted to its entity, completely messing the URI up. You will get a 404 (not found) error So, for instance

http://redmine.my.domain/repositories/browse/project/Brücke -> http://redmine.my.domain/repositories/browse/project/Br%FCcke
http://redmine.my.domain/repositories/browse/project/Brücke/übersicht.jpg -> http://redmine.my.domain/repositories/browse/project/Br%FCcke%2F%FCbersicht.jpg

Manually replacing every %2F with / did not help really. You will get the Redmine page, but an error saying that the file does not exist in the repo. Replacing every %-code with its special char will lead to a 500 (internal) error.

Setting the codepages under Administration -> Repositories -> Codepages does not affect this in any way. Tried settings:
  • UTF-8, ISO 8859-1, ISO 8859-15, CP1252
  • ISO-8859-1, ISO-8859-15, UTF-8, CP1252
  • ISO-8859-15, ISO-8859-1, UTF-8, CP1252
Tested under:
  • Redmine 0.7.3.devel.2079 (MySQL)
  • ruby 1.8.6 (2007-09-24 patchlevel 111) [i386-mswin32]
  • Rails 2.1.0

repo_enc_bug_01.gif - Encoding bug (Firefox 3.0.4 Encoding: Auto/UTF-8, Windows XP SP3) (20.5 KB) Toni Kerschbaum, 2008-12-13 20:09

repo_enc_bug_02.gif - Encoding bug (Firefox 3.0.4 Encoding: ISO-8859-1, Windows XP SP3) (20 KB) Toni Kerschbaum, 2008-12-13 20:09

fs-setting.png (13.2 KB) Toshi MARUYAMA, 2011-02-08 08:14

fs-browse.png (28.9 KB) Toshi MARUYAMA, 2011-02-08 08:14


Related issues

Related to Redmine - Defect #2664: Mercurial: Repository path encoding of non UTF-8 characters Closed 2009-02-04

Associated revisions

Revision 4899
Added by Toshi MARUYAMA over 6 years ago

scm: add CP932 at Setting::ENCODINGS (#2664, #2274).

CP932 is variant Japanese Shift_JIS on Windows.

Revision 4906
Added by Toshi MARUYAMA over 6 years ago

scm: add "scm_iconv" method for repository path encoding in abstract_adapter.rb (#2664, #2274).

Revision 4907
Added by Toshi MARUYAMA over 6 years ago

scm: filesystem: refactor for path encoding (#2274).

Revision 4921
Added by Toshi MARUYAMA over 6 years ago

scm: Ruby 1.9 compatibility for browsing repository tree (#2664, #2274).

If repository path is not UTF-8, Ruby 1.9 shows trace.

Revision 4940
Added by Toshi MARUYAMA over 6 years ago

scm: add "path_encoding" column in repositories table (#2664, #2274).

Contributed by Yuya Nishihara.

Revision 4941
Added by Toshi MARUYAMA over 6 years ago

scm: update adapter initialize() to use path encoding (#2664, #2274).

Revision 4943
Added by Toshi MARUYAMA over 6 years ago

scm: filesystem: add path encoding select item (#2274).

Revision 4944
Added by Toshi MARUYAMA over 6 years ago

scm: filesystem: fix mistake of respository select box on r4943 (#2274).

Revision 4988
Added by Toshi MARUYAMA over 6 years ago

scm: filesystem: add note "Default: UTF-8" in path encoding setting (#2274).

Revision 5520
Added by Toshi MARUYAMA over 6 years ago

scm: use i18n string at path encoding setting (#2274, #2664, #3462, #5251).

Revision 5521
Added by Toshi MARUYAMA over 6 years ago

scm: add Japanese translation "field_scm_path_encoding" (#2274, #2664, #3462, #5251).

Revision 5522
Added by Toshi MARUYAMA over 6 years ago

scm: update locales for path encoding setting (#2274, #2664, #3462, #5251).

Revision 5523
Added by Toshi MARUYAMA over 6 years ago

scm: use i18n string at path encoding setting note (#2274, #2664, #3462, #5251).

Revision 5524
Added by Toshi MARUYAMA over 6 years ago

scm: add Japanese string at path encoding setting note (#2274, #2664, #3462, #5251).

Revision 5525
Added by Toshi MARUYAMA over 6 years ago

scm: update locales for path encoding setting note (#2274, #2664, #3462, #5251).

Revision 5629
Added by Toshi MARUYAMA over 6 years ago

scm: filesystem: fix loss non ASCII paths if path_encoding is '' (#2274).

Revision 5863
Added by Toshi MARUYAMA over 6 years ago

scm: add "path_encoding" method in abstract adapter (#2274, #3462, #2664, #5251).

Revision 5864
Added by Toshi MARUYAMA over 6 years ago

scm: filesystem: override "path_encoding" method in adapter (#2274).

Revision 5865
Added by Toshi MARUYAMA over 6 years ago

scm: filesystem: add unit adapter test of default path_encoding is UTF-8 (#2274).

History

#1 Updated by Paul Rivier almost 9 years ago

  • Assignee set to Paul Rivier

Hi Tony,

I can't reproduce it here with chars like é € ä ß or à ... My only available environment is Linux, so it might be related to your windows environment. Can anybody else confirm this bug and narrow it as much as possible ?

#2 Updated by Toni Kerschbaum almost 9 years ago

I just noticed that I forgot to upload the pictures, so here they are.

It could be related to my Windows environment (Windows Server 2003). If I can help you in any way to narrow or track down the problem, please let me know.

Also, the path in the first two pictures remains fully browseable (and the files downloadable), but some other paths and directories do not. Sadly, I can't find any notable difference in those paths and filenames.

#3 Updated by Toshi MARUYAMA over 6 years ago

  • Assignee changed from Paul Rivier to Toshi MARUYAMA

#4 Updated by Toshi MARUYAMA over 6 years ago

  • Subject changed from Filesystem Repository and (german) special chars to Filesystem Repository path encoding of non UTF-8 characters

#5 Updated by Toshi MARUYAMA over 6 years ago

Try #2664 note-19 patches.

These are my Japanese Windows Vista images.

#6 Updated by Toshi MARUYAMA over 6 years ago

  • Status changed from New to 7
  • Target version set to 1.2.0
  • % Done changed from 0 to 80

#7 Updated by Toshi MARUYAMA over 6 years ago

  • Status changed from 7 to Closed
  • % Done changed from 80 to 100
  • Resolution set to Fixed

I finished implementing until r4944.

It is impossible to prepare test tar ball and test non UTF-8 encoding paths on all OSs, filesystems and Languages.

If tar ball has Latin-1 path encoding files, I can't extract it on my Japanese Windows.

Please refer.
http://mercurial.selenic.com/wiki/EncodingStrategy?action=recall&rev=6

Also available in: Atom PDF