Project

General

Profile

Actions

Defect #2664

closed

Mercurial: Repository path encoding of non UTF-8 characters

Added by Jérémie Delaitre about 15 years ago. Updated about 13 years ago.

Status:
Closed
Priority:
Normal
Assignee:
Toshi MARUYAMA
Category:
SCM
Target version:
Start date:
2009-02-04
Due date:
% Done:

100%

Estimated time:
Resolution:
Fixed
Affected version:

Description

Environment

  • Server OS: Debian Lenny
  • Redmine: svn rev 2361 (same problem with 0.8.0)
  • Ruby: 1.8.6
  • RubyGems: 1.3.1
  • Rails: 2.1.2
  • PostgreSQL: 8.3.5
  • Mercurial: 1.0.1
  • System locale: en_us.UTF8
  • Database encoding: utf8
  • Database locale: fr_FR.UTF8 (same problem with en_us.UTF8)

Error

Running: ruby script/runner "Repository.fetch_changesets" -e production gives the following errors:

/home/redmine/redmine-0.8.0/vendor/rails/railties/lib/commands/runner.rb:47: /home/redmine/redmine-0.8.0/vendor/rails/activerecord/lib/active_record/connection_adapters/abstract_adapter.rb:147:in `log':
 RuntimeError: ERROR        C22021  Minvalid byte sequence for encoding "UTF8": 0xe97365
    HThis error can also happen if the byte sequence does not match the encoding
 expected by the server, which is controlled by "client_encoding".        Fwchar.c        L1545
   Rreport_invalid_encoding: INSERT INTO "changes" ("changeset_id", "action", "revision", "branch", "from_path",
 "path", "from_revision") VALUES(781, E'A', NULL, NULL, NULL,
 E'/Quantity/doc/Présentation du projet.pdf', NULL) RETURNING "id" (ActiveRecord::StatementInvalid)
        from /home/redmine/redmine-0.8.0/vendor/rails/activerecord/lib/active_record/connection_adapters/postgresql_adapter.rb:484:in `execute'
        from /home/redmine/redmine-0.8.0/vendor/rails/activerecord/lib/active_record/connection_adapters/postgresql_adapter.rb:929:in `select_raw'
        from /home/redmine/redmine-0.8.0/vendor/rails/activerecord/lib/active_record/connection_adapters/postgresql_adapter.rb:916:in `select'
        from /home/redmine/redmine-0.8.0/vendor/rails/activerecord/lib/active_record/connection_adapters/abstract/database_statements.rb:7:in `select_all_without_query_cache'
        from /home/redmine/redmine-0.8.0/vendor/rails/activerecord/lib/active_record/connection_adapters/abstract/query_cache.rb:61:in `select_all'
        from /home/redmine/redmine-0.8.0/vendor/rails/activerecord/lib/active_record/connection_adapters/abstract/database_statements.rb:13:in `select_one'
        from /home/redmine/redmine-0.8.0/vendor/rails/activerecord/lib/active_record/connection_adapters/abstract/database_statements.rb:19:in `select_value'
        from /home/redmine/redmine-0.8.0/vendor/rails/activerecord/lib/active_record/connection_adapters/postgresql_adapter.rb:433:in `insert'
         ... 31 levels...
        from /home/redmine/redmine-0.8.0/vendor/rails/railties/lib/commands/runner.rb:47
        from /home/redmine/apps/lib/ruby/site_ruby/1.8/rubygems/custom_require.rb:31:in `gem_original_require'
        from /home/redmine/apps/lib/ruby/site_ruby/1.8/rubygems/custom_require.rb:31:in `require'
        from script/runner:3

The error seems quite similar to #834, #917, and #1663 but the error is not appening on the same table. Here, the problem comes from the "changes" table while the already reported (and corrected) issues refer a problem on the "changesets" table.

The problem seems to comes from file path which are not converted to UTF-8 (as we can notice, there is a 'é' character in the file path).

I have tried different encoding in the repository tab settings without success.


Files

issue-2664-0.9-stable-2010-04-11.patch (10.7 KB) issue-2664-0.9-stable-2010-04-11.patch Yuya Nishihara, 2010-04-11 04:39
git-bzr.patch (1.37 KB) git-bzr.patch Toshi MARUYAMA, 2010-06-07 15:03
20110207-db.diff (649 Bytes) 20110207-db.diff Toshi MARUYAMA, 2011-02-07 10:39
20110207-git-cvs-fs.diff (2.09 KB) 20110207-git-cvs-fs.diff Toshi MARUYAMA, 2011-02-07 10:39
20110207-impl.diff (5.71 KB) 20110207-impl.diff Toshi MARUYAMA, 2011-02-07 10:39
hg-ruby-1.9.diff (3.06 KB) hg-ruby-1.9.diff Toshi MARUYAMA, 2011-02-07 12:15
for-trunk-r4893-20110220.patch (8.2 KB) for-trunk-r4893-20110220.patch Toshi MARUYAMA, 2011-02-20 15:59
ruby-1.9.2-japanese-windows.png (50.5 KB) ruby-1.9.2-japanese-windows.png Toshi MARUYAMA, 2011-03-04 13:07

Related issues

Related to Redmine - Defect #5251: Git: Repository path encoding of non UTF-8 charactersClosedToshi MARUYAMA2010-04-07

Actions
Related to Redmine - Defect #2274: Filesystem Repository path encoding of non UTF-8 charactersClosedToshi MARUYAMA2008-12-04

Actions
Related to Redmine - Defect #3462: CVS: Repository path encoding of non UTF-8 characters ClosedToshi MARUYAMA2009-06-08

Actions
Related to Redmine - Defect #7064: Mercurial adapter does not recognize non alphabetic nor numeric in UTF-8 copied filesClosedToshi MARUYAMA2010-12-07

Actions
Related to Redmine - Feature #2799: Support for Bazaar's shared reposetories (created with init-repo)New2009-02-21

Actions
Related to Redmine - Defect #6090: Most binary files become corrupted when downloading from CVS repository browser when Redmine is running on a Windows serverClosedToshi MARUYAMA2010-08-09

Actions
Related to Redmine - Feature #4050: Ruby 1.9 supportClosed2009-10-18

Actions
Related to Redmine - Feature #3396: Git: use --encoding=UTF-8 in "git log"ClosedToshi MARUYAMA2009-05-20

Actions
Related to Redmine - Defect #4773: Redmine+Git+PostgresSQL 8.4 fails with linux kernel tree (encoding)ClosedJean-Philippe Lang2010-02-09

Actions
Has duplicate Redmine - Defect #5408: Mercurial and chinese codeClosed2010-04-30

Actions
Has duplicate Redmine - Defect #3677: fetching changesets from Mercurial repository failsClosedToshi MARUYAMA2009-07-27

Actions
Has duplicate Redmine - Defect #8726: Redmine+Mercurial+PostgreSQL 9 falls with cyrrilic filenames in repositoryClosed2011-06-30

Actions
Actions #1

Updated by Jérémie Delaitre about 15 years ago

I just noticed something weird with Mercurial.

When I try to remove the file mentionned above, mercurial did not success...
So the problem is maybe from Mercurial instead of Redmine.

Actions #2

Updated by Wei Li about 15 years ago

I have the same issue with Bazaar.

Actions #3

Updated by Daniel Lima over 14 years ago

I have the same issue too. My environment is a Redmine 0.8.4 in a Windows 2003 Server. My repo is Mercurial with some special character in file path, like 'ç', 'ã', 'õ'.

Actions #4

Updated by Yuya Nishihara about 14 years ago

That's because Mercurial (and also Git) treats file names as byte string.
Here we need to convert them to UTF-8, but, there's no reliable info about file name encoding.

Wei Li wrote:

I have the same issue with Bazaar.

I'm not sure about Bazaar, but it must handle paths as UTF-8, so it seems strange.

Actions #5

Updated by Rui Tang about 14 years ago

I'm using redmine 0.9.3 on Windows Server 2003, has the same problem.

C:\redmine-0.9>ruby script/runner "Repository.fetch_changesets" -e production
c:/ruby/lib/ruby/gems/1.8/gems/rails-2.3.5/lib/commands/runner.rb:48: c:/ruby/li
b/ruby/gems/1.8/gems/activerecord-2.3.5/lib/active_record/connection_adapters/ab
stract_adapter.rb:219:in `log': Mysql::Error: Incorrect string value: '\xB2\xE2\
xCA\xD4\xB9\xDC...' for column 'path' at row 1: INSERT INTO `changes` (`changese
t_id`, `action`, `revision`, `branch`, `from_path`, `path`, `from_revision`) VAL
UES (
ActiveRecord::StatementInvalid)
from c:/ruby/lib/ruby/gems/1.8/gems/activerecord-2.3.5/lib/active_record
/connection_adapters/mysql_adapter.rb:323:in `execute'
from c:/ruby/lib/ruby/gems/1.8/gems/activerecord-2.3.5/lib/active_record
/connection_adapters/abstract/database_statements.rb:259:in `insert_sql'
from c:/ruby/lib/ruby/gems/1.8/gems/activerecord-2.3.5/lib/active_record
/connection_adapters/mysql_adapter.rb:333:in `insert_sql'
from c:/ruby/lib/ruby/gems/1.8/gems/activerecord-2.3.5/lib/active_record
/connection_adapters/abstract/database_statements.rb:44:in `insert_without_query
_dirty'
from c:/ruby/lib/ruby/gems/1.8/gems/activerecord-2.3.5/lib/active_record
/connection_adapters/abstract/query_cache.rb:18:in `insert'
from c:/ruby/lib/ruby/gems/1.8/gems/activerecord-2.3.5/lib/active_record
/base.rb:2908:in `create_without_timestamps'
from c:/ruby/lib/ruby/gems/1.8/gems/activerecord-2.3.5/lib/active_record
/timestamp.rb:53:in `create_without_callbacks'
from c:/ruby/lib/ruby/gems/1.8/gems/activerecord-2.3.5/lib/active_record
/callbacks.rb:266:in `create'
... 30 levels...
from c:/ruby/lib/ruby/gems/1.8/gems/rails-2.3.5/lib/commands/runner.rb:4
8
from c:/ruby/lib/ruby/site_ruby/1.8/rubygems/custom_require.rb:31:in `ge
m_original_require'
from c:/ruby/lib/ruby/site_ruby/1.8/rubygems/custom_require.rb:31:in `re
quire'
from script/runner:3

Actions #6

Updated by Yuya Nishihara about 14 years ago

Yuya Nishihara wrote:

That's because Mercurial (and also Git) treats file names as byte string.
Here we need to convert them to UTF-8, but, there's no reliable info about file name encoding.

Hi, I made a patch to fix the issue.
It adds repositories.path_encoding column, which can be configured via Settings -> Repository tab.
Since it changes database schema, rake db:migrate is necessary. Please try it with care.

Actions #7

Updated by Toshi MARUYAMA almost 14 years ago

Yuya Nishihara wrote:

That's because Mercurial (and also Git) treats file names as byte string.
Here we need to convert them to UTF-8, but, there's no reliable info about file name encoding.

Wei Li wrote:

I have the same issue with Bazaar.

I'm not sure about Bazaar, but it must handle paths as UTF-8, so it seems strange.

I asked this Bazaar problem and #5578 at Mercurial-ja google group (in Japanese).
The reason is same with #5578.
Bazaar issue: want an option to set the output encoding, especially on win32 .
And I got a suggestion that XMLOutput plugin is better than "bzr log".

Actions #8

Updated by Toshi MARUYAMA almost 14 years ago

Git problem is reported at #5251.
I tried git and Bazaar and I could display multi-bytes characters path.
This patch is for git and Bazaar.

Actions #9

Updated by Yuya Nishihara almost 14 years ago

Toshi Maruyama wrote:

Git problem is reported at #5251.
I tried git and Bazaar and I could display multi-bytes characters path.
This patch is for git and Bazaar.

Git and Mercurial have absolutely the same problem, they treat filename as bytes, so the patch about Git seems reasonable.

But Bazaar's problem sounds different to me. It lies on the communication layer between Redmine and Bazaar. They should talk in UTF-8 but currently not.

Actions #10

Updated by xiaoyu yin almost 14 years ago

To share my experence:
My system is Windows XP SP3, and Windows Server 2003.
My steps are:
1.Uninstall the redmine and reinstall it.
2.Creat hg repository in redmine folder.
3.Import the patch.
4.run "rake db:migrate RAILS_ENV=production" command
5.Restart the redmine service.

The path_encoding column was added successfully.

And I test the coding type in the list one by one, the "GBK" is correct for me.

Good luck for you!

Actions #11

Updated by xiaoyu yin almost 14 years ago

By the way: if you have data in database, please backup it first and restore it after that the path_encoding column was added successfully.

Actions #12

Updated by Toshi MARUYAMA almost 14 years ago

Additionally, you need to delete repository setting created before patch applied and recreate the same repository from Redmine settings tab.

Actions #13

Updated by Toshi MARUYAMA over 13 years ago

  • Status changed from New to Closed
Actions #14

Updated by Toshi MARUYAMA over 13 years ago

  • Status changed from Closed to Reopened
  • Assignee set to Toshi MARUYAMA
  • Priority changed from High to Low
Actions #15

Updated by Toshi MARUYAMA over 13 years ago

  • Status changed from Reopened to 7
Actions #16

Updated by Toshi MARUYAMA over 13 years ago

  • Target version set to Unplanned backlogs
Actions #17

Updated by bo ye about 13 years ago

please fix this first in later version of redmine(like 1.1.2?) if #4455 Mercurial overhaul could not be done soon.
this problem stopped us from using hg for redmine completely.

Actions #18

Updated by Toshi MARUYAMA about 13 years ago

  • Subject changed from Redmine+Mercurial+PostgreSQL: path encoding and multi-bytes characters to Repository path encoding of non UTF-8 characters (Mercurial, Git and CVS)
Actions #19

Updated by Toshi MARUYAMA about 13 years ago

These are patches for svn trunk r4799 and 1.1 stable r4800.

Actions #20

Updated by Toshi MARUYAMA about 13 years ago

  • Subject changed from Repository path encoding of non UTF-8 characters (Mercurial, Git and CVS) to Repository path encoding of non UTF-8 characters (Mercurial, Git, CVS and Filesystem)
Actions #21

Updated by Toshi MARUYAMA about 13 years ago

This is ad hoc Mercurial adapter patch for Redmine SVN trunk and Ruby 1.9.
I confirmed to run on my Japanese Windows Vista and Mingw Ruby 1.9.2.

There is another "IO.popen" issue #6090.
source:tags/1.1.1/lib/redmine/scm/adapters/abstract_adapter.rb#L184

I think we need to refactor "IO.popen" such as Yuya's Mercurial overhaul

Actions #22

Updated by Paolo Losi about 13 years ago

I can confirm that the patches (see note 19) solve the problem for us.
Since the issue is blocking, we would like to know if
the is a method to backout the patches and undo the schema
migration when there will be an official release that addresses this issue.

Thanks

Actions #23

Updated by Paolo Losi about 13 years ago

Paolo Losi wrote:

I can confirm that the patches (see note 19) solve the problem for us.
Since the issue is blocking, we would like to know if
the is a method to backout the patches and undo the schema
migration when there will be an official release that addresses this issue.

Answering myself:

rake db:migrate:down

Sorry for the noise

Actions #24

Updated by bo ye about 13 years ago

wow, these patches work great!!
it seems even better than before, at least now issues can be linked with r####
please make this to the next minor version 1.1.2. you have my vote. :)

there is a minor problem with the patches though. it doesn't work with codeview plugin. the error on the repository page:

NoMethodError in Code_review#update_revisions_view 
Showing vendor/plugins/redmine_code_review/app/views/code_review/_update_revisions.html.erb where line #6 raised: 

undefined method `review_count' for #<Changeset:0x63e7320>
Extracted source (around line #6): 

3: # and open the template in the editor.
4: %>
5: 
6: <script type="text/javascript">
7: <% @changesets.each do |changeset| %>
8:   <%
9:   if changeset.review_count > 0

Toshi MARUYAMA wrote:

These are patches for svn trunk r4799 and 1.1 stable r4800.

Actions #25

Updated by Toshi MARUYAMA about 13 years ago

bo ye wrote:

please make this to the next minor version 1.1.2. you have my vote. :)

This feature has big behaviour change and has a db migrate.
So, I think it is difficult to apply 1.1 stable.
But, we need to consider to apply 1.2.

Yuya, what do you think?

Actions #26

Updated by Yuya Nishihara about 13 years ago

Toshi MARUYAMA wrote:

bo ye wrote:

please make this to the next minor version 1.1.2. you have my vote. :)

This feature has big behaviour change and has a db migrate.
So, I think it is difficult to apply 1.1 stable.
But, we need to consider to apply 1.2.

Yuya, what do you think?

Same idea. For now, you can work around the issue by:

  1. put lib/redmine/scm/adapters/path_encodable_wrapper.rb
  2. apply the patch only for app/models/repository.rb
  3. and replace the content of def new_scm method in place of db:migrate:
    scm = Redmine::Scm::Adapters::PathEncodableWrapper.new(scm, path_encoding) unless path_encoding.blank?
    

    by
    scm = Redmine::Scm::Adapters::PathEncodableWrapper.new(scm, 'encoding-name-of-your-repo')
    
Actions #27

Updated by Toshi MARUYAMA about 13 years ago

Ruby 1.9 compatibility and tests are very serious.
Please see source:trunk/test/unit/lib/redmine/scm/adapters/git_adapter_test.rb@4810#L77 .

Actions #28

Updated by Toshi MARUYAMA about 13 years ago

Japanese Shift_JIS and Traditional Chinese Big5 have 0x5c(backslash) problem and these are incompatible with ASCII.
Japanese EUC-JP is compatible with ASCII.

Ruby uses ANSI api to fork a process on Windows.

Actions #29

Updated by Toshi MARUYAMA about 13 years ago

Subversion supports URL encoding for path and Redmine uses it.
I think Redmine Mercurial adapter need to wrap command line path of cat, diff and annotate such as Yuya's Mercurial overhaul helper extension.

Actions #30

Updated by Toshi MARUYAMA about 13 years ago

I start implementing in new way.
Ruby 1.9 compatibility is very serious.

Actions #31

Updated by Toshi MARUYAMA about 13 years ago

  • Subject changed from Repository path encoding of non UTF-8 characters (Mercurial, Git, CVS and Filesystem) to Repository path encoding of non UTF-8 characters (Mercurial and Filesystem)
  • Priority changed from Low to Normal
  • % Done changed from 0 to 20
Actions #32

Updated by Toshi MARUYAMA about 13 years ago

  • Subject changed from Repository path encoding of non UTF-8 characters (Mercurial and Filesystem) to Mercurial: Repository path encoding of non UTF-8 characters
  • % Done changed from 20 to 60
Actions #33

Updated by Toshi MARUYAMA about 13 years ago

  • % Done changed from 60 to 90
Actions #34

Updated by Toshi MARUYAMA about 13 years ago

I can't run on my Japanese Windows Ruby 1.9.2 without #4050 Ruby-1.9-Encoding.default_external.diff .
Despite applying this patch, I got following error.

[2011-03-04 20:51:58] ERROR Encoding::InvalidByteSequenceError: "\x9C" followed by "-" on Windows-31J
        r:/Ruby192/lib/ruby/gems/1.9.1/gems/rails-2.3.11/lib/rails/rack/static.rb:37:in `file?'
        r:/Ruby192/lib/ruby/gems/1.9.1/gems/rails-2.3.11/lib/rails/rack/static.rb:37:in `file_exist?'
        r:/Ruby192/lib/ruby/gems/1.9.1/gems/rails-2.3.11/lib/rails/rack/static.rb:18:in `call'
        r:/Ruby192/lib/ruby/gems/1.9.1/gems/rack-1.1.0/lib/rack/urlmap.rb:47:in `block in call'
        r:/Ruby192/lib/ruby/gems/1.9.1/gems/rack-1.1.0/lib/rack/urlmap.rb:41:in `each'
        r:/Ruby192/lib/ruby/gems/1.9.1/gems/rack-1.1.0/lib/rack/urlmap.rb:41:in `call'
        r:/Ruby192/lib/ruby/gems/1.9.1/gems/rails-2.3.11/lib/rails/rack/log_tailer.rb:17:in `call'
        r:/Ruby192/lib/ruby/gems/1.9.1/gems/rack-1.1.0/lib/rack/content_length.rb:13:in `call'
        r:/Ruby192/lib/ruby/gems/1.9.1/gems/rack-1.1.0/lib/rack/handler/webrick.rb:48:in `service'
        r:/Ruby192/lib/ruby/1.9.1/webrick/httpserver.rb:111:in `service'
        r:/Ruby192/lib/ruby/1.9.1/webrick/httpserver.rb:70:in `run'
        r:/Ruby192/lib/ruby/1.9.1/webrick/server.rb:183:in `block in start_thread'

Actions #35

Updated by Toshi MARUYAMA about 13 years ago

"Files" module has similar strange behavior on my Japanese Windows Ruby 1.9.2.
I give up fix it.

Actions #36

Updated by Toshi MARUYAMA about 13 years ago

  • Status changed from 7 to Closed
  • % Done changed from 90 to 100
  • Resolution set to Fixed

I finished implementing this feature until r5001.
And I confirmed to run on my Japanese Windows Ruby 1.8 and Linux Ruby 1.8.

On Linux with #4050 Ruby-1.9-Encoding.default_external.diff , I confirmed to run in ISO-8859-1 locale.

Actions

Also available in: Atom PDF