Project

General

Profile

Actions

Feature #31921

closed

Changes to properly support 4 byte characters (emoji) when database is MySQL

Added by Marius BĂLTEANU over 4 years ago. Updated about 4 years ago.

Status:
Closed
Priority:
Normal
Assignee:
Category:
Database
Target version:
Start date:
Due date:
% Done:

0%

Estimated time:
Resolution:
Fixed

Description

Currently, Redmine with MySQL as database doesn't support very well 4 byte characters (for ex: emojis) and you need some experience with MySQL in order to properly configure MySQL (and Redmine) to support those characters. Because of this, there are reported plenty of tickets that I'm going to relate to this issue later.

I would like to work on this in order to improve the default Redmine installation and also, to provide some documentation regarding how you can migrate an existing Redmine installation to support 4 byte characters.

For now, I've created two patches:
1. 0001-Update-default-database-config-for-MySQL.patch
- Adds to the database.yml.example the required encoding and collation, including a note to inform users that are safe only for new installations.
- Adds a test to ensure that an issue can be created using a emoji in description
- Updates some old instructions
- These settings will be default in Rails 6: https://github.com/rails/rails/pull/33608

2. attachment:0002-Task-to-check-mysql-support-for-utf8mb4.patch
This task checks the MySQL configuration (innodb_file_per_table, innodb_large_prefix, innodb_file_format) and ENGINE, ROW_FORMAT and TABLE_COLLATION for Redmine tables. Any feedback or ideas to improve this task are welcome!

I'll try to create later a Wiki page regarding the steps required to migrate and existing installation or at least some useful links.


Files


Related issues

Related to Redmine - Defect #24242: Adding comments to ticket returns internal server errorNeeds feedback

Actions
Related to Redmine - Defect #28774: Internal Error when Submit the Description with Vietnamese in Unicode fontsClosed

Actions
Related to Redmine - Defect #27406: Internal Server Error while posting smile in issue's descriptionClosed

Actions
Related to Redmine - Defect #25959: "Smile of the death" problemClosed

Actions
Related to Redmine - Defect #24992: MailHandler: an unexpected error occurred when receiving email: invalid byte sequence in UTF-8Needs feedback

Actions
Related to Redmine - Defect #23586: Create index on mysql exceed limitsClosed

Actions
Related to Redmine - Defect #22119: Error: Data too long for column 'notes' when copy paste picturesClosed

Actions
Related to Redmine - Defect #21398: Mysql: 500 server error when submitting 4 bytes utf8 (to be saved in the 'notes' field)Closed

Actions
Related to Redmine - Defect #20143: Mailhandler cannot handle 4-byte charactersNew

Actions
Related to Redmine - Patch #19742: RedmineInstall: MySQL: collation_databaseClosed

Actions
Related to Redmine - Defect #18866: MySQL: disappear after 4-Byte UTF-8Closed

Actions
Related to Redmine - Defect #30848: Error when creating issue with emoji in descriptionNew

Actions
Related to Redmine - Defect #27984: Arabic Support IssuesClosed

Actions
Related to Redmine - Defect #27803: Can't post a smiley face in issues descriptionClosed

Actions
Related to Redmine - Defect #27361: Failed when using EmojiClosed

Actions
Related to Redmine - Defect #27238: Mysql Error after upgrading Redmine from 2.5 to 3.*Closed

Actions
Related to Redmine - Defect #26386: Mysql: Unable to update ticket with EmojiClosed

Actions
Related to Redmine - Defect #24030: When SVN or Git repository has a commit comment include an emoji (4 bytes charactor), error occursClosed

Actions
Related to Redmine - Defect #23557: Special (micro) character in message field causes internal server errorClosed

Actions
Related to Redmine - Defect #22618: subject utf-8 char vs mysql2Closed

Actions
Related to Redmine - Defect #19334: Error 500 when uploading file with umlaut in filename from MacClosed

Actions
Related to Redmine - Defect #10772: 4-byte utf-8 charactersClosed

Actions
Related to Redmine - Patch #32054: Add test for 4 byte characters (emoji) supportClosedGo MAEDA

Actions
Related to Redmine - Defect #34235: Losing data due to unsupported charactersClosed

Actions
Has duplicate Redmine - Defect #33649: Ascii code in the 'Description' causes 500 errorClosed

Actions
Has duplicate Redmine - Defect #34574: Email Fetcher catch no Mails with smileyClosed

Actions
Actions #1

Updated by Marius BĂLTEANU over 4 years ago

  • Related to Defect #24242: Adding comments to ticket returns internal server error added
Actions #2

Updated by Marius BĂLTEANU over 4 years ago

  • Related to Defect #28774: Internal Error when Submit the Description with Vietnamese in Unicode fonts added
Actions #3

Updated by Marius BĂLTEANU over 4 years ago

  • Related to Defect #27406: Internal Server Error while posting smile in issue's description added
Actions #4

Updated by Marius BĂLTEANU over 4 years ago

Actions #5

Updated by Marius BĂLTEANU over 4 years ago

  • Related to Defect #24992: MailHandler: an unexpected error occurred when receiving email: invalid byte sequence in UTF-8 added
Actions #6

Updated by Marius BĂLTEANU over 4 years ago

  • Related to Defect #23586: Create index on mysql exceed limits added
Actions #7

Updated by Marius BĂLTEANU over 4 years ago

  • Related to Defect #22119: Error: Data too long for column 'notes' when copy paste pictures added
Actions #8

Updated by Marius BĂLTEANU over 4 years ago

  • Related to Defect #21398: Mysql: 500 server error when submitting 4 bytes utf8 (to be saved in the 'notes' field) added
Actions #9

Updated by Marius BĂLTEANU over 4 years ago

  • Related to Defect #20143: Mailhandler cannot handle 4-byte characters added
Actions #10

Updated by Marius BĂLTEANU over 4 years ago

  • Description updated (diff)
Actions #11

Updated by Marius BĂLTEANU over 4 years ago

  • Related to Patch #19742: RedmineInstall: MySQL: collation_database added
Actions #12

Updated by Marius BĂLTEANU over 4 years ago

  • Related to Defect #18866: MySQL: disappear after 4-Byte UTF-8 added
Actions #13

Updated by Marius BĂLTEANU over 4 years ago

  • Related to Defect #30848: Error when creating issue with emoji in description added
Actions #14

Updated by Marius BĂLTEANU over 4 years ago

Actions #15

Updated by Marius BĂLTEANU over 4 years ago

  • Related to Defect #27803: Can't post a smiley face in issues description added
Actions #16

Updated by Marius BĂLTEANU over 4 years ago

Actions #17

Updated by Marius BĂLTEANU over 4 years ago

  • Related to Defect #27238: Mysql Error after upgrading Redmine from 2.5 to 3.* added
Actions #18

Updated by Marius BĂLTEANU over 4 years ago

  • Related to Defect #26386: Mysql: Unable to update ticket with Emoji added
Actions #19

Updated by Marius BĂLTEANU over 4 years ago

  • Related to Defect #24030: When SVN or Git repository has a commit comment include an emoji (4 bytes charactor), error occurs added
Actions #20

Updated by Marius BĂLTEANU over 4 years ago

  • Related to Defect #23557: Special (micro) character in message field causes internal server error added
Actions #21

Updated by Marius BĂLTEANU over 4 years ago

Actions #22

Updated by Marius BĂLTEANU over 4 years ago

  • Related to Defect #19334: Error 500 when uploading file with umlaut in filename from Mac added
Actions #23

Updated by Marius BĂLTEANU over 4 years ago

Actions #24

Updated by Go MAEDA over 4 years ago

As mentioned in #19742#note-3, the collation utf8mb4_unicode_ci is really problematic for Japanese users. I think it is better to leave it to users without setting a default value. At least, we should not set utf8mb4_unicode_ci as the default value. Or, setting utf8mb4_general_ci or utf8mb4_bin is better.

The problem of utf8mb4_unicode_ci is that it treats many different kinds of characters as the same when comparing. For example, the following combinations of Japanese words are treated as the same words.

  • "はは" (means mother, pronounced as "haha") and "パパ" (means dad, pronounced as "papa")
  • "からす" (means a crow, pronounced as "karasu") and "ガラス" (means glass, pronounced as "garasu")
  • "カメラ" (means a camera, pronounced as "kamera") and "ガメラ" (means Gamera, pronounced as "gamera")
  • "パリ" (menas Paris, pronounced as "pari") and "バリ" (means Bali in Indonesia, pronounced as "bari")

Imagine that searching the word "からす" hits not only "からす" (a crow) but also "ガラス" (glass). Setting the collation to utf8mb4_unicode_ci makes issues filter and full-text search unusable in some languages.

Actions #25

Updated by Marius BĂLTEANU over 4 years ago

  • File 0001-Update-default-database-config-for-MySQL.patch added
  • File 0002-Task-to-check-mysql-support-for-utf8mb4.patch added

Thanks for your detailed response, I totally missed that note and now I understand the problem.

I've removed collation from default settings.

Actions #26

Updated by Marius BĂLTEANU over 4 years ago

  • File deleted (0001-Update-default-database-config-for-MySQL.patch)
Actions #27

Updated by Marius BĂLTEANU over 4 years ago

  • File deleted (0002-Task-to-check-mysql-support-for-utf8mb4.patch)
Actions #28

Updated by Marius BĂLTEANU over 4 years ago

  • File deleted (0001-Update-default-database-config-for-MySQL.patch)
Actions #29

Updated by Marius BĂLTEANU over 4 years ago

  • File deleted (0002-Task-to-check-mysql-support-for-utf8mb4.patch)
Actions #30

Updated by Marius BĂLTEANU over 4 years ago

Go Maeda, it is ok from you point of view to set only the encoding to encoding: utf8mb4?

Actions #31

Updated by Go MAEDA over 4 years ago

Marius BALTEANU wrote:

Go Maeda, it is ok from you point of view to set only the encoding to encoding: utf8mb4?

Yes, I think it is OK.

But I am not sure whether the comment "# Remove encoding for existing installation on MySQL" is necessary.

It would be even better if there is a comment like "Use "utf8" instead of "utf8mb4" if you use MySQL earlier than #{VERSION}".

Actions #32

Updated by Go MAEDA over 4 years ago

I think the change against test/unit/issue_test.rb should not be merged because it may break tests on the official CI server if the encoding of the MySQL database on the CI server is not utf8mb4.
http://www.redmine.org/builds/

No one except Jean-Philippe Lang can update configurations of the CI server.

Actions #33

Updated by Go MAEDA over 4 years ago

I updated Marius's patch:

  • Removed "with ruby1.9". The version of Ruby is unnecessary because Redmine no longer supports Ruby 1.8 that require 'mysql' adapter instead of 'mysql2' adapter. Please see the example file in Redmine 2.6 for reference: source:tags/2.6.9/config/database.yml.example
  • Removed the test not to break the CI server
  • Added the comment "Use "utf8" instead of "utfmb4" for MySQL prior to 5.7.7"
  • Removed "# Remove encoding for existing installation on MySQL" because an existing installation may require encoding setting
diff --git a/config/database.yml.example b/config/database.yml.example
index 57bc51605..727b4d89b 100644
--- a/config/database.yml.example
+++ b/config/database.yml.example
@@ -1,4 +1,4 @@
-# Default setup is given for MySQL with ruby1.9.
+# Default setup is given for MySQL 5.7.7 or later.
 # Examples for PostgreSQL, SQLite3 and SQL Server can be found at the end.
 # Line indentation must be 2 spaces (no tabs).

@@ -8,7 +8,8 @@ production:
   host: localhost
   username: root
   password: "" 
-  encoding: utf8
+  # Use "utf8" instead of "utfmb4" for MySQL prior to 5.7.7
+  encoding: utf8mb4

 development:
   adapter: mysql2
@@ -16,7 +17,8 @@ development:
   host: localhost
   username: root
   password: "" 
-  encoding: utf8
+  # Use "utf8" instead of "utfmb4" for MySQL prior to 5.7.7
+  encoding: utf8mb4

 # Warning: The database defined as "test" will be erased and
 # re-generated from your development database when you run "rake".
@@ -27,7 +29,8 @@ test:
   host: localhost
   username: root
   password: "" 
-  encoding: utf8
+  # Use "utf8" instead of "utfmb4" for MySQL prior to 5.7.7
+  encoding: utf8mb4

 # PostgreSQL configuration example
 #production:
Actions #34

Updated by Marius BĂLTEANU over 4 years ago

  • Assignee deleted (Marius BĂLTEANU)
  • Target version changed from Candidate for next major release to 4.1.0

Go MAEDA wrote:

I updated Marius's patch:

  • Removed "with ruby1.9". The version of Ruby is unnecessary because Redmine no longer supports Ruby 1.8 that require 'mysql' adapter instead of 'mysql2' adapter. Please see the example file in Redmine 2.6 for reference: source:tags/2.6.9/config/database.yml.example
  • Removed the test not to break the CI server
  • Added the comment "Use "utf8" instead of "utfmb4" for MySQL prior to 5.7.7"
  • Removed "# Remove encoding for existing installation on MySQL" because an existing installation may require encoding setting

Thanks for updating my patch, I agree with your changes. Regarding the test, I'll create a new ticket assigned to Jean-Philippe to update the CI server. Until then, I propose to deliver this in 4.1.0.

Actions #35

Updated by Go MAEDA over 4 years ago

  • Status changed from New to Closed
  • Assignee set to Go MAEDA
  • Resolution set to Fixed

Committed the updated patch in #31921#note-33. Thanks.

Actions #36

Updated by Marius BĂLTEANU over 4 years ago

  • Related to Patch #32054: Add test for 4 byte characters (emoji) support added
Actions #37

Updated by Deoren Moor over 4 years ago

Marius BALTEANU wrote:

I would like to work on this in order to improve the default Redmine installation and also, to provide some documentation regarding how you can migrate an existing Redmine installation to support 4 byte characters.

...

I'll try to create later a Wiki page regarding the steps required to migrate and existing installation or at least some useful links.

I understand that this feature has only just released today in v4.1.0, but is there a new page somewhere covering the conversion steps? I looked for it, but didn't spot it.

If the page doesn't exist yet, would you be willing to stub out such a page so that others could opt-in for (Watch) notifications when the page is updated?

Many thanks for taking the time to push this forward.

Actions #38

Updated by Gérard L about 4 years ago

Deoren Moor wrote:

Marius BALTEANU wrote:

I would like to work on this in order to improve the default Redmine installation and also, to provide some documentation regarding how you can migrate an existing Redmine installation to support 4 byte characters.

...

I'll try to create later a Wiki page regarding the steps required to migrate and existing installation or at least some useful links.

I understand that this feature has only just released today in v4.1.0, but is there a new page somewhere covering the conversion steps? I looked for it, but didn't spot it.

If the page doesn't exist yet, would you be willing to stub out such a page so that others could opt-in for (Watch) notifications when the page is updated?

Many thanks for taking the time to push this forward.

+1

Actions #39

Updated by Martin Rys about 4 years ago

With the help of this article I've managed to convert from utf8 to utf8mb4 on a 4.1.0 bitnami install
http://blog.andriylesyuk.com/4-byte-utf-8-problem-in-redmine-and-how-to-fix-it/

Both database.yml and my.cnf were using utf8mb4, so there was nothing to do there for me.

I've already had row_format set to DYNAMIC (meaning InnoDB was already Barracuda and not Antelope, verified via SHOW TABLE STATUS), so I only listed all tables in the database

SELECT table_name, CCSA.character_set_name, CCSA.collation_name FROM information_schema.`TABLES` T, information_schema.`COLLATION_CHARACTER_SET_APPLICABILITY` CCSA WHERE CCSA.collation_name = T.table_coll
ation AND T.table_schema = "bitnami_redmineplusagile";

And converted every single table from utf8 to utf8mb4 with the appropriate collation (Czech here since this is a Czech install)

...
ALTER TABLE wikis CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_czech_ci;
ALTER TABLE workflows CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_czech_ci;

After that I verified that everything works by posting a comment in an issue with an emoji

EDIT: Ironically it seems that posting an emoji in here will cut the text off.

Actions #40

Updated by Holger Just almost 4 years ago

  • Has duplicate Defect #33649: Ascii code in the 'Description' causes 500 error added
Actions #41

Updated by Marius BĂLTEANU about 3 years ago

  • Related to Defect #34235: Losing data due to unsupported characters added
Actions #42

Updated by Marius BĂLTEANU about 3 years ago

  • Has duplicate Defect #34574: Email Fetcher catch no Mails with smiley added
Actions

Also available in: Atom PDF