Project

General

Profile

Actions

Feature #1341

closed

keep consistency between browser encoding and mysql database encoding

Added by Gilles Ballanger almost 16 years ago. Updated almost 16 years ago.

Status:
Closed
Priority:
Low
Assignee:
-
Category:
-
Target version:
-
Start date:
2008-06-01
Due date:
% Done:

0%

Estimated time:
Resolution:
Invalid

Description

Hello,

after trying to lazily import issue directly in mysql database (I know it's very bad to do like this, it's better using ruby importation script via redmine API) I see issue subject (and description too) badly utf-8 encoded :

if I import record via SQL using

INSERT INTO `issues` (`tracker_id`, `project_id`, `subject`, `description`, `due_date`, `category_id`, `status_id`, `assigned_to_id`, `priority_id`, `fixed_version_id`, `author_id`, `lock_version`, `created_on`, `updated_on`, `start_date`, `done_ratio`, `estimated_hours`) VALUES 
(4, 1, 'é', 'é', NULL, NULL, 1, NULL, 4, 5, 3, 0, '2008-05-30 14:19:43', '2008-05-30 14:19:43', '2008-05-30', 0, NULL);

the resulting database dump for this record is
INSERT INTO `issues` (`id`, `tracker_id`, `project_id`, `subject`, `description`, `due_date`, `category_id`, `status_id`, `assigned_to_id`, `priority_id`, `fixed_version_id`, `author_id`, `lock_version`, `created_on`, `updated_on`, `start_date`, `done_ratio`, `estimated_hours`) VALUES 
(234, 4, 1, 0xc3a9, 0xc3a9, NULL, NULL, 1, NULL, 4, 5, 3, 0, '2008-05-30 14:19:43', '2008-05-30 14:19:43', '2008-05-30', 0, NULL);

and the result on browser show a '�' char in place of 'é'

If I insert an issue via browser with subject='é' and description='é' the dumped database is

INSERT INTO `issues` (`id`, `tracker_id`, `project_id`, `subject`, `description`, `due_date`, `category_id`, `status_id`, `assigned_to_id`, `priority_id`, `fixed_version_id`, `author_id`, `lock_version`, `created_on`, `updated_on`, `start_date`, `done_ratio`, `estimated_hours`) VALUES 
(235, 1, 1, 0xc383c2a9, 0xc383c2a9, NULL, NULL, 1, NULL, 4, NULL, 3, 0, '2008-06-01 13:14:20', '2008-06-01 13:14:20', '2008-06-01', 0, NULL);

=> the 'é' char was coded in hex c3 83 c2 a9 (the correct encoding is c3 a9)

This produce "é" in place of "é" in mysql database dump but a correct é char in issue

My knowledge in ruby are not sufficient to reproduce this kind of string encoding interpretation but I do it in python :
first I encode 'é' char in utf-8 by:

>>> unicode("é","utf-8").encode("utf-8")
'\xc3\xa9'

If i take each value, declare it as unicode string and recode it in utf-8, I have the same bad coding behavior

>>> (u"\xc3").encode("utf-8")
'\xc3\x83'
>>> (u"\xa9").encode("utf-8")
'\xc2\xa9'

so perhaps there is a double encoding conversion somewhere between what is send from browser to what is write in database ?

Once again importing directly in database is a very bad idea (this is a perfect example) but meanwhile this inconsistency between database coding and page rendering can be source of problem in future ...

Actions

Also available in: Atom PDF