[Solved] - Emoticons/Emojii/4-byte Non-UTF-8 characters

Added by Matthew Paul almost 3 years ago

I logged this as a bug but it was rejected. But, it really is a bug, so I though I'd log the solution here in case anyone else has the same problem and wants that.

So, the problem is caused when a user sends an email, or pastes into the notes or description, text that includes many of the new emoticons that are now available. I thought it was a problem with Japanese characters but in fact, it's just that they use emoticons/emojii a lot more, so you can get it for US/English also.

So the problem is caused when you are using MySQL (i.e. a lot of people) and when you create the tables in the STANDARD UTF-8 character set, which, again, is a lot. In my case it's the bitnami stack which has pre-created those tables for me, but it isn't just bitnami stacks, it's a lot of other people will have this problem too.

So, the real solution is not to use UTF8 but instead UTF8MB4 which is the full 4 byte implementation of utf8. So you'd think that utf8 would actually be already fully able to handle that, but in fact in mysql it isn't. So, you have 2 choices -

1) Convert the various affected tables (issues, journals mostly) to UTF8MB4 - which I tried but I'm pretty useless at mysql so I couldn't get the syntax right. But that info is in here elsewhere as an issue I think.

2) You can simply strip/convert any 4 byte emoticon/emojii characters in redmine before you send that to the db. So, that's what this does.

So the solution, again, is kind of a hack because I'm not good enough to write a plugin or do it properly. But you can -

1) Install plugin custom_workflow. Which is awesome in any case.
2) Install this code as a flow for all projects. It will strip out any emoticons and replace with ? - again, if I were any good at ruby/redmine/mysql/anything I would try and do this better, but this works so... :-)

Ok, here it is (and es, that regex is really that big, lots of characters) -

regex = /[\u{00A9}\u{00AE}\u{203C}\u{2049}\u{2122}\u{2139}\u{2194}-\u{2199}\u{21A9}-\u{21AA}\u{231A}-\u{231B}\u{2328}\u{23CF}\u{23E9}-\u{23F3}\u{23F8}-\u{23FA}\u{24C2}\u{25AA}-\u{25AB}\u{25B6}\u{25C0}\u{25FB}-\u{25FE}\u{2600}-\u{2604}\u{260E}\u{2611}\u{2614}-\u{2615}\u{2618}\u{261D}\u{2620}\u{2622}-\u{2623}\u{2626}\u{262A}\u{262E}-\u{262F}\u{2638}-\u{263A}\u{2648}-\u{2653}\u{2660}\u{2663}\u{2665}-\u{2666}\u{2668}\u{267B}\u{267F}\u{2692}-\u{2694}\u{2696}-\u{2697}\u{2699}\u{269B}-\u{269C}\u{26A0}-\u{26A1}\u{26AA}-\u{26AB}\u{26B0}-\u{26B1}\u{26BD}-\u{26BE}\u{26C4}-\u{26C5}\u{26C8}\u{26CE}-\u{26CF}\u{26D1}\u{26D3}-\u{26D4}\u{26E9}-\u{26EA}\u{26F0}-\u{26F5}\u{26F7}-\u{26FA}\u{26FD}\u{2702}\u{2705}\u{2708}-\u{270D}\u{270F}\u{2712}\u{2714}\u{2716}\u{271D}\u{2721}\u{2728}\u{2733}-\u{2734}\u{2744}\u{2747}\u{274C}\u{274E}\u{2753}-\u{2755}\u{2757}\u{2763}-\u{2764}\u{2795}-\u{2797}\u{27A1}\u{27B0}\u{27BF}\u{2934}-\u{2935}\u{2B05}-\u{2B07}\u{2B1B}-\u{2B1C}\u{2B50}\u{2B55}\u{3030}\u{303D}\u{3297}\u{3299}\u{1F004}\u{1F0CF}\u{1F170}-\u{1F171}\u{1F17E}-\u{1F17F}\u{1F18E}\u{1F191}-\u{1F19A}\u{1F201}-\u{1F202}\u{1F21A}\u{1F22F}\u{1F232}-\u{1F23A}\u{1F250}-\u{1F251}\u{1F300}-\u{1F321}\u{1F324}-\u{1F393}\u{1F396}-\u{1F397}\u{1F399}-\u{1F39B}\u{1F39E}-\u{1F3F0}\u{1F3F3}-\u{1F3F5}\u{1F3F7}-\u{1F4FD}\u{1F4FF}-\u{1F53D}\u{1F549}-\u{1F54E}\u{1F550}-\u{1F567}\u{1F56F}-\u{1F570}\u{1F573}-\u{1F579}\u{1F587}\u{1F58A}-\u{1F58D}\u{1F590}\u{1F595}-\u{1F596}\u{1F5A5}\u{1F5A8}\u{1F5B1}-\u{1F5B2}\u{1F5BC}\u{1F5C2}-\u{1F5C4}\u{1F5D1}-\u{1F5D3}\u{1F5DC}-\u{1F5DE}\u{1F5E1}\u{1F5E3}\u{1F5EF}\u{1F5F3}\u{1F5FA}-\u{1F64F}\u{1F680}-\u{1F6C5}\u{1F6CB}-\u{1F6D0}\u{1F6E0}-\u{1F6E5}\u{1F6E9}\u{1F6EB}-\u{1F6EC}\u{1F6F0}\u{1F6F3}\u{1F910}-\u{1F918}\u{1F980}-\u{1F984}\u{1F9C0}]/

# description
if @issue.description_changed?
  if @issue.description
    self.description = [self.description,""].join
    self.description = self.description.gsub regex, '?'

# notes
if @issue.notes
  self.notes = [self.notes,""].join
  self.notes = self.notes.gsub regex, '?'

Ok, that's t, no answers, responses required, and not sure if I put this in the correct place, but like I say wanted to log it anyway.

Replies (3)

RE: Invalid Characters/Redmine crashes or rejects emails for non-UTF-8 characters - Solution - Added by Matthew Paul almost 3 years ago

PS - it is also a bug in this redmine, but whereas on my stack it actually crashes, here it simply cuts all text off after the offending 4 byte emoticon... which you obviously don't want to happen...

RE: [Solved] - Emoticons/Emojii/4-byte Non-UTF-8 characters - Added by Deoren Moor almost 3 years ago

Some related issues: #24992, #21398

#21398#note-10 in particular goes into a good level of detail.

We experience this with scraped emails (they're rejected) and attempts to post to an issue via the web UI (500 server error).