Feature #31921

Changes to properly support 4 byte characters (emoji) when database is MySQL

Added by Marius BALTEANU 3 months ago. Updated 2 months ago.

Status:ClosedStart date:
Priority:NormalDue date:
Assignee:Go MAEDA% Done:

0%

Category:Database
Target version:4.1.0
Resolution:Fixed

Description

Currently, Redmine with MySQL as database doesn't support very well 4 byte characters (for ex: emojis) and you need some experience with MySQL in order to properly configure MySQL (and Redmine) to support those characters. Because of this, there are reported plenty of tickets that I'm going to relate to this issue later.

I would like to work on this in order to improve the default Redmine installation and also, to provide some documentation regarding how you can migrate an existing Redmine installation to support 4 byte characters.

For now, I've created two patches:
1. 0001-Update-default-database-config-for-MySQL.patch
- Adds to the database.yml.example the required encoding and collation, including a note to inform users that are safe only for new installations.
- Adds a test to ensure that an issue can be created using a emoji in description
- Updates some old instructions
- These settings will be default in Rails 6: https://github.com/rails/rails/pull/33608

2. attachment:0002-Task-to-check-mysql-support-for-utf8mb4.patch
This task checks the MySQL configuration (innodb_file_per_table, innodb_large_prefix, innodb_file_format) and ENGINE, ROW_FORMAT and TABLE_COLLATION for Redmine tables. Any feedback or ideas to improve this task are welcome!

I'll try to create later a Wiki page regarding the steps required to migrate and existing installation or at least some useful links.

0001-Update-default-database-config-for-MySQL.patch Magnifier (2.32 KB) Marius BALTEANU, 2019-09-03 09:17


Related issues

Related to Redmine - Defect #24242: Adding comments to ticket returns internal server error Needs feedback
Related to Redmine - Defect #28774: Internal Error when Submit the Description with Vietnames... Closed
Related to Redmine - Defect #27406: Internal Server Error while posting smile in issue's desc... Closed
Related to Redmine - Defect #25959: "Smile of the death" problem Closed
Related to Redmine - Defect #24992: MailHandler: an unexpected error occurred when receiving ... Needs feedback
Related to Redmine - Defect #23586: Create index on mysql exceed limits Needs feedback
Related to Redmine - Defect #22119: Error: Data too long for column 'notes' when copy paste p... Closed
Related to Redmine - Defect #21398: Mysql: 500 server error when submitting 4 bytes utf8 (to ... New
Related to Redmine - Defect #20143: Mailhandler cannot handle 4-byte characters New
Related to Redmine - Patch #19742: RedmineInstall: MySQL: collation_database Closed
Related to Redmine - Defect #18866: MySQL: disappear after 4-Byte UTF-8 New
Related to Redmine - Defect #30848: Error when creating issue with emoji in description New
Related to Redmine - Defect #27984: Arabic Support Issues Closed
Related to Redmine - Defect #27803: Can't post a smiley face in issues description Closed
Related to Redmine - Defect #27361: Failed when using Emoji Closed
Related to Redmine - Defect #27238: Mysql Error after upgrading Redmine from 2.5 to 3.* Closed
Related to Redmine - Defect #26386: Mysql: Unable to update ticket with Emoji Closed
Related to Redmine - Defect #24030: When SVN or Git repository has a commit comment include a... New
Related to Redmine - Defect #23557: Special (micro) character in message field causes interna... Closed
Related to Redmine - Defect #22618: subject utf-8 char vs mysql2 Closed
Related to Redmine - Defect #19334: Error 500 when uploading file with umlaut in filename fro... Closed
Related to Redmine - Defect #10772: 4-byte utf-8 characters Closed
Related to Redmine - Patch #32054: Add test for 4 byte characters (emoji) support New

Associated revisions

Revision 18454
Added by Go MAEDA 2 months ago

Update default database config for MySQL to support 4 byte characters (emoji) (#31921).

Patch by Marius BALTEANU and Go MAEDA.

History

#1 Updated by Marius BALTEANU 3 months ago

  • Related to Defect #24242: Adding comments to ticket returns internal server error added

#2 Updated by Marius BALTEANU 3 months ago

  • Related to Defect #28774: Internal Error when Submit the Description with Vietnamese in Unicode fonts added

#3 Updated by Marius BALTEANU 3 months ago

  • Related to Defect #27406: Internal Server Error while posting smile in issue's description added

#4 Updated by Marius BALTEANU 3 months ago

#5 Updated by Marius BALTEANU 3 months ago

  • Related to Defect #24992: MailHandler: an unexpected error occurred when receiving email: invalid byte sequence in UTF-8 added

#6 Updated by Marius BALTEANU 3 months ago

  • Related to Defect #23586: Create index on mysql exceed limits added

#7 Updated by Marius BALTEANU 3 months ago

  • Related to Defect #22119: Error: Data too long for column 'notes' when copy paste pictures added

#8 Updated by Marius BALTEANU 3 months ago

  • Related to Defect #21398: Mysql: 500 server error when submitting 4 bytes utf8 (to be saved in the 'notes' field) added

#9 Updated by Marius BALTEANU 3 months ago

  • Related to Defect #20143: Mailhandler cannot handle 4-byte characters added

#10 Updated by Marius BALTEANU 3 months ago

  • Description updated (diff)

#11 Updated by Marius BALTEANU 3 months ago

  • Related to Patch #19742: RedmineInstall: MySQL: collation_database added

#12 Updated by Marius BALTEANU 3 months ago

  • Related to Defect #18866: MySQL: disappear after 4-Byte UTF-8 added

#13 Updated by Marius BALTEANU 3 months ago

  • Related to Defect #30848: Error when creating issue with emoji in description added

#14 Updated by Marius BALTEANU 3 months ago

#15 Updated by Marius BALTEANU 3 months ago

  • Related to Defect #27803: Can't post a smiley face in issues description added

#16 Updated by Marius BALTEANU 3 months ago

#17 Updated by Marius BALTEANU 3 months ago

  • Related to Defect #27238: Mysql Error after upgrading Redmine from 2.5 to 3.* added

#18 Updated by Marius BALTEANU 3 months ago

  • Related to Defect #26386: Mysql: Unable to update ticket with Emoji added

#19 Updated by Marius BALTEANU 3 months ago

  • Related to Defect #24030: When SVN or Git repository has a commit comment include an emoji (4 bytes charactor), error occurs added

#20 Updated by Marius BALTEANU 3 months ago

  • Related to Defect #23557: Special (micro) character in message field causes internal server error added

#21 Updated by Marius BALTEANU 3 months ago

#22 Updated by Marius BALTEANU 3 months ago

  • Related to Defect #19334: Error 500 when uploading file with umlaut in filename from Mac added

#23 Updated by Marius BALTEANU 3 months ago

#24 Updated by Go MAEDA 3 months ago

As mentioned in #19742#note-3, the collation utf8mb4_unicode_ci is really problematic for Japanese users. I think it is better to leave it to users without setting a default value. At least, we should not set utf8mb4_unicode_ci as the default value. Or, setting utf8mb4_general_ci or utf8mb4_bin is better.

The problem of utf8mb4_unicode_ci is that it treats many different kinds of characters as the same when comparing. For example, the following combinations of Japanese words are treated as the same words.

  • "はは" (means mother, pronounced as "haha") and "パパ" (means dad, pronounced as "papa")
  • "からす" (means a crow, pronounced as "karasu") and "ガラス" (means glass, pronounced as "garasu")
  • "カメラ" (means a camera, pronounced as "kamera") and "ガメラ" (means Gamera, pronounced as "gamera")
  • "パリ" (menas Paris, pronounced as "pari") and "バリ" (means Bali in Indonesia, pronounced as "bari")

Imagine that searching the word "からす" hits not only "からす" (a crow) but also "ガラス" (glass). Setting the collation to utf8mb4_unicode_ci makes issues filter and full-text search unusable in some languages.

#25 Updated by Marius BALTEANU 3 months ago

  • File 0001-Update-default-database-config-for-MySQL.patch added
  • File 0002-Task-to-check-mysql-support-for-utf8mb4.patch added

Thanks for your detailed response, I totally missed that note and now I understand the problem.

I've removed collation from default settings.

#26 Updated by Marius BALTEANU 3 months ago

  • File deleted (0001-Update-default-database-config-for-MySQL.patch)

#27 Updated by Marius BALTEANU 3 months ago

  • File deleted (0002-Task-to-check-mysql-support-for-utf8mb4.patch)

#28 Updated by Marius BALTEANU 3 months ago

  • File deleted (0001-Update-default-database-config-for-MySQL.patch)

#29 Updated by Marius BALTEANU 3 months ago

  • File deleted (0002-Task-to-check-mysql-support-for-utf8mb4.patch)

#30 Updated by Marius BALTEANU 2 months ago

Go Maeda, it is ok from you point of view to set only the encoding to encoding: utf8mb4?

#31 Updated by Go MAEDA 2 months ago

Marius BALTEANU wrote:

Go Maeda, it is ok from you point of view to set only the encoding to encoding: utf8mb4?

Yes, I think it is OK.

But I am not sure whether the comment "# Remove encoding for existing installation on MySQL" is necessary.

It would be even better if there is a comment like "Use "utf8" instead of "utf8mb4" if you use MySQL earlier than #{VERSION}".

#32 Updated by Go MAEDA 2 months ago

I think the change against test/unit/issue_test.rb should not be merged because it may break tests on the official CI server if the encoding of the MySQL database on the CI server is not utf8mb4.
http://www.redmine.org/builds/

No one except Jean-Philippe Lang can update configurations of the CI server.

#33 Updated by Go MAEDA 2 months ago

I updated Marius's patch:

  • Removed "with ruby1.9". The version of Ruby is unnecessary because Redmine no longer supports Ruby 1.8 that require 'mysql' adapter instead of 'mysql2' adapter. Please see the example file in Redmine 2.6 for reference: source:tags/2.6.9/config/database.yml.example
  • Removed the test not to break the CI server
  • Added the comment "Use "utf8" instead of "utfmb4" for MySQL prior to 5.7.7"
  • Removed "# Remove encoding for existing installation on MySQL" because an existing installation may require encoding setting
diff --git a/config/database.yml.example b/config/database.yml.example
index 57bc51605..727b4d89b 100644
--- a/config/database.yml.example
+++ b/config/database.yml.example
@@ -1,4 +1,4 @@
-# Default setup is given for MySQL with ruby1.9.
+# Default setup is given for MySQL 5.7.7 or later.
 # Examples for PostgreSQL, SQLite3 and SQL Server can be found at the end.
 # Line indentation must be 2 spaces (no tabs).

@@ -8,7 +8,8 @@ production:
   host: localhost
   username: root
   password: "" 
-  encoding: utf8
+  # Use "utf8" instead of "utfmb4" for MySQL prior to 5.7.7
+  encoding: utf8mb4

 development:
   adapter: mysql2
@@ -16,7 +17,8 @@ development:
   host: localhost
   username: root
   password: "" 
-  encoding: utf8
+  # Use "utf8" instead of "utfmb4" for MySQL prior to 5.7.7
+  encoding: utf8mb4

 # Warning: The database defined as "test" will be erased and
 # re-generated from your development database when you run "rake".
@@ -27,7 +29,8 @@ test:
   host: localhost
   username: root
   password: "" 
-  encoding: utf8
+  # Use "utf8" instead of "utfmb4" for MySQL prior to 5.7.7
+  encoding: utf8mb4

 # PostgreSQL configuration example
 #production:

#34 Updated by Marius BALTEANU 2 months ago

  • Assignee deleted (Marius BALTEANU)
  • Target version changed from Candidate for next major release to 4.1.0

Go MAEDA wrote:

I updated Marius's patch:

  • Removed "with ruby1.9". The version of Ruby is unnecessary because Redmine no longer supports Ruby 1.8 that require 'mysql' adapter instead of 'mysql2' adapter. Please see the example file in Redmine 2.6 for reference: source:tags/2.6.9/config/database.yml.example
  • Removed the test not to break the CI server
  • Added the comment "Use "utf8" instead of "utfmb4" for MySQL prior to 5.7.7"
  • Removed "# Remove encoding for existing installation on MySQL" because an existing installation may require encoding setting

Thanks for updating my patch, I agree with your changes. Regarding the test, I'll create a new ticket assigned to Jean-Philippe to update the CI server. Until then, I propose to deliver this in 4.1.0.

#35 Updated by Go MAEDA 2 months ago

  • Status changed from New to Closed
  • Assignee set to Go MAEDA
  • Resolution set to Fixed

Committed the updated patch in #31921#note-33. Thanks.

#36 Updated by Marius BALTEANU 2 months ago

  • Related to Patch #32054: Add test for 4 byte characters (emoji) support added

Also available in: Atom PDF