Feature #24681

Syntax highlighter: replace CodeRay with Rouge

Added by Go MAEDA 10 months ago. Updated about 1 month ago.

Status:NewStart date:
Priority:NormalDue date:
Assignee:Mischa The Evil% Done:

0%

Category:Text formatting
Target version:Candidate for next major release
Resolution:

Description

I propose replacing CodeRay with other syntax highlighter, Rouge . It supports 100+ languages.

Current syntax highlighter CodeRay does not support popular languages such as C#, Visual Basic, Objective C, Swift, ... and so on. Unfortunately the development of CodeRay is not so active now, it is difficult to expect that CodeRay will support those languages.

Citation from Rouge's README.md :

Advantages to CodeRay
  • The HTML output from Rouge is fully compatible with stylesheets designed for pygments.
  • The lexers are implemented with a dedicated DSL, rather than being hand-coded.
  • Rouge supports every language CodeRay does and more.

redmine_rouge_plugin_csharp.png (17.2 KB) Go MAEDA, 2016-12-26 05:16

0001-Replace-syntax-highlighter-CodeRay-with-Rouge.patch Magnifier (13 KB) Go MAEDA, 2016-12-28 07:08

highlight-sample.png (88.6 KB) Go MAEDA, 2016-12-28 07:46

without_opening_tag.png (2.1 KB) Marius BALTEANU, 2016-12-28 10:38

with_open_tag.png (3.14 KB) Marius BALTEANU, 2016-12-28 10:38

highlight-php.png (15.2 KB) Go MAEDA, 2016-12-29 06:46

0003-Update-jstoolbar-for-Rouge-syntax-highlighter.patch Magnifier (23.7 KB) Go MAEDA, 2016-12-29 06:52

0004-s-CodeRay-Rouge.patch Magnifier (830 Bytes) Go MAEDA, 2016-12-29 06:52

0005-Support-PHP-snippets-without-open-tag.patch Magnifier (3.34 KB) Go MAEDA, 2016-12-29 06:52

0002-Update-help-for-Rouge-syntax-highlighter.patch Magnifier (457 KB) Go MAEDA, 2016-12-29 06:53

0006-Fixed-multiline-comments-highlighting-issue-in-file-.patch Magnifier (1.64 KB) Go MAEDA, 2016-12-29 16:46

multiline-comment-before.png - multiline comment issue : before fix (15.7 KB) Go MAEDA, 2016-12-29 16:46

multiline-comment-after.png - multiline comment issue : after fix (15.7 KB) Go MAEDA, 2016-12-29 16:49

0001-Syntax-highlighter-fall-back-to-Rouge-if-the-languag.patch Magnifier (11.8 KB) Go MAEDA, 2017-01-02 12:08


Related issues

Related to Redmine - Feature #2623: C# syntax highlighting New 2009-01-30
Related to Redmine - Feature #3032: Use google Prettify for syntax highlighting instead of Co... New 2009-03-23
Related to Redmine - Feature #1313: Optionally use ultraviolet for syntax highlighting New 2008-05-27
Related to Redmine - Patch #1651: Hack to make redmine use pygmentize instead of CodeRay New 2008-07-15

History

#1 Updated by Go MAEDA 10 months ago

#2 Updated by Go MAEDA 10 months ago

  • Related to Feature #3032: Use google Prettify for syntax highlighting instead of CodeRay added

#3 Updated by Go MAEDA 10 months ago

  • Related to Feature #1313: Optionally use ultraviolet for syntax highlighting added

#4 Updated by Go MAEDA 10 months ago

  • Related to Patch #1651: Hack to make redmine use pygmentize instead of CodeRay added

#5 Updated by Mischa The Evil 10 months ago

Just for the information: see also the redmine_rouge (https://github.com/ngyuki/redmine_rouge) plugin.

#6 Updated by Go MAEDA 10 months ago

Mischa The Evil wrote:

Just for the information: see also the redmine_rouge (https://github.com/ngyuki/redmine_rouge) plugin.

Thanks for the information. The plugin works fine on the current trunk (r16111).
It seems that we can implement this feature in a small amount of code.

#7 Updated by Go MAEDA 10 months ago

This is a patch to replace CodeRay with Rouge.

With this patch applied, we can highlight 100+ languages including C# (csharp), Visual Basic (vb), Objective-C (objective_c), Swift (swift) and Perl (perl).

Users can use all language classes (code class="XXX") currently supported by CodeRay except for taskpaper.

Since the supported language dramatically increases from 24 to 100+, I think that the merit of this patch for users is very large.

#8 Updated by Marius BALTEANU 10 months ago

I tested the patch and I've the following observations:
  1. We should rename the "codeRay" variable in source:trunk/public/javascripts/jstoolbar/jstoolbar.js#L378
  2. Rename the "CodeRay" from source:trunk/public/stylesheets/rtl.css#L375
  3. Update help documentation: source:trunk/public/help/en/wiki_syntax_detailed_markdown.html#L300

Also, the Rogue library seems to have an issue with the PHP language which is highlighted only when the opening tag is present.

Without open tag:

With open tag:

I think is related to this issue. I tried the workaround from there and it doesn't work.

#9 Updated by Go MAEDA 10 months ago

Marius BALTEANU wrote:

I tested the patch and I've the following observations:
  1. We should rename the "codeRay" variable in source:trunk/public/javascripts/jstoolbar/jstoolbar.js#L378
  2. Rename the "CodeRay" from source:trunk/public/stylesheets/rtl.css#L375
  3. Update help documentation: source:trunk/public/help/en/wiki_syntax_detailed_markdown.html#L300

Thanks for your feedback, I have fixed all of the above.

Also, the Rogue library seems to have an issue with the PHP language which is highlighted only when the opening tag is present.

Also fixed.

#10 Updated by Go MAEDA 10 months ago

Fixed multiline comments highlighting issue in file view, the same problem that was reported as #7495 for CodeRay.

Before fix:
multiline comment issue : before fix

After fix:
multiline comment issue : after fix

#11 Updated by Go MAEDA 10 months ago

  • Description updated (diff)

Rouge 2.0.7 supports 113 languages.
The following is the list of supported languages.

#12 Updated by Go MAEDA 10 months ago

  • Target version set to Candidate for next major release

#13 Updated by Mischa The Evil 10 months ago

  • Assignee set to Mischa The Evil

I'm currently wrapping-up a review of the proposed change and the corresponding patches provided by Go.

#14 Updated by Mischa The Evil 10 months ago

  • Status changed from New to Needs feedback
  • Assignee deleted (Mischa The Evil)

Done! Here it is...


I have spent some time testing the Rouge syntax highlighter using the patches provided by Go in contrast to the implementation provided by the redmine_rouge plugin (which seems to have an issue with the CSS-styles — thus the highlighting — missing during printing).
I specifically looked at the Ruby code syntax highlighting, but I also included two others: Diff and HTML (with inline CSS and Javascript).

Procedure:

I started with collecting some — let's say — more challenging code examples. I found some interesting ones (specifically for comparison of CodeRay vs. Rouge) in the CodeRay Scanner Test repository (https://github.com/rubychan/coderay-scanner-tests). I also took at random some long, more complicated methods from the Redmine core to compare.

With these examples in place, I added all the code to a single wikipage and dropped the files into the projects repository for testing of the repo view file function (my findings for the single wikipage are not different from the projects repository view file functionality, so I won't go into this specifically). I then printed the whole wikipage to a PDF (using Chrome's internal print function — this is actually what's not working using the redmine_rouge plugin) for easy and consistent comparison.

I did these tests (by means of anything better) on a fresh deployment of a Bitnami Redmine 3.3.1 stack. First I printed the wikipage as said above using the default CodeRay highlighter. Then I applied the patches (cherry-picked patches 0001, 0005 and 0006; omitting 0002, 0003 and 0004 patches) manually, restarted the whole stack and printed the same wikipage using the Rouge highlighter. Good to note is that I thus have tested Rouge's capabilities using the currently patched-in colorful style theme.

I think that the results I got (two, 9+ MB, 25-page, A3-landscape pdf's) are actually pretty clear albeit large — sneakpeak: I'm not very excited about them. I don't talk about performance here (I haven't tested that), but rather about the quality of the provided code highlighting of, particularly, the Ruby and the Diff code by the Rouge library. Along with this, I also see some issues with the current CodeRay highlighter (whether these are issues in the CodeRay library or the Redmine implementation, I don't know at the moment). I'll below elaborate on my observations during the review, using numbered items and referring to the filenames of the testfiles I have used. I have written it such that one can read/scan along the test code in the pdf's (preferably splitscreen) in roughly the same order.

Observations:

No.: Filename: Comments:
Ruby code from CodeRay Scanner Test repository
o01. [def.in.rb] Rouge breaks on the ampersands in front of the blocks.
o02. [diffed.in.rb] With Rouge no clear difference between regex and string highlighting, and operator keywords aren't highlighted.
o03. [operators.in.rb] Rouge doesn't recognize the aliases, gives the at signs the error class and doesn't highlight method definitions where it should.
o04. [quotes.in.rb] Rouge breaks on the complex quoted literals and doesn't differ between different parts of regexes. CodeRay seems to break on the ampersand within the regex.
o05. [regexp.in.rb] Rouge doesn't highlight code inside regexes, doesn't highlight escape sequences within regexes.
o06. [ruby19.in.rb] Rouge seems to have problems taking in the Ruby 1.9 hash syntaxes.
o07. [ruby2.in.rb] Rouge breaks on the ampersands, messes up the keyword argument symbol highlighting, highlights extend as pseudo-keyword, highlights self incorrect, doesn't differ (clearly) between the %i{} and %w{} literals (maybe even some other %* literals too), highlights Ruby 2.1 syntax wrong, doesn't highlight Ruby 2.2 hash literal symbol keys with colons/quotes correctly and has issues highlighting the Ruby 2.3 squiggly heredoc. CodeRay doesn't differ class names from module names in definitions, which Rouge does.
o08. [strange.in.rb] Rouge does not seem to differ floats from constants correctly, does not recognize/differ backticked shell code (using both ` and %x), does not differ between inline instance-/class-/global variables, doesn't differ modifiers within regex constructs, misses several distinctions (char vs content, delimiter vs constant), has troubles with nested code which causes the rendering of the highlighting of the rest of the code completely broken.
o09. [undef.in.rb] Rouge breaks on the / method. Both Rouge and CodeRay break on the ampersands.
o10. [unicode.in.rb] Rouge does not seem to differ between integers and floats, and it renders unicode chars with class error — which seems wrong.
Diff 'code' from CodeRay Scanner Test repository
o11. [diff.in.diff] Rouge differs in what it highlights in comparison with CodeRay, where Rouge seems to put the attention more on the meta-data of the patch (diff command, indexes) and CodeRay more to the changes itself (filename, linenumbers). Rouge doesn't support inline change highlighting at all.
o12. [github.in.diff] See above. No inline change highlight.
o13. [heredoc.in.diff] CodeRay seems to do some inline highlighting of heredocs which Rouge doesn't (this seems to be a curious thing as far as I understand it fully).
HTML code from CodeRay Scanner Test repository
o14. [cdata.in.html] Rouge does not mark the inside of the cdata blocks in the inline javascript and css comments and renders the < as a unicode char with class error.
o15. [redmine.in.html] Rouge has the same cdata block issue as above. Rouge doesn't highlight inline css and javascript, and does not highlight html entities. Redmine breaks both highlighters due to incorrect handling of code tags.
Ruby code from redmine source and issue
o16. [application_helper.rb] Rouge incorrectly highlights attr as pseudo keyword. CodeRay has more distinct highlighting of escape sequences within regexes in contrast to Rouge. Both CodeRay and Rouge break on the ampersands (Rouge multiple times, CodeRay once). CodeRay provides more distinct string highlighting. CodeRay provides better distinction of regexes vs strings than Rouge. CodeRay does not highlight buildin methods (name, const_defined?, lambda, class_eval, etc.) specially as like Rouge does (as they are just methods which CodeRay doesn't highlight in any way).
o17. [multiline_comment.rb] Rouge omits highlighting of the she-bang line.
o18. [project_nested_set.rb] Here it becomes clear IMO that Rouge's method highlighting can be a real good thing (see eg. the calls to the *_changed? methods within the lambda and the method chains of self.class.where...where...maximum and self.class.where...pluck...first, etc).

Two of the biggest differences that seem to be recurring along a multitude of the Ruby testfiles are due to differences in design perspective:

x1: Rouge highlights 'pseudo-keywords', CodeRay doesn't — for good reasons AFAICR.
x2: Rouge highlights methods (class 'nf'), CodeRay doesn't — again, for good reasons AFAICR.

Some conclusions:

A1: The tested patch series succesfully implements the replacement of CodeRay with Rouge (excluding conclusions B1 and B2, in case these are due to Redmine implementation(s)).
A2: The reported PHP issue and the additionally found multiline issue are succesfully fixed by the additional patches.
A3: We might be able to tweak the js-toolbar code button in some neat way to support (a selection of) popular languages added by Rouge.

B1: CodeRay (and Rouge) highlighting (within the current Redmine implementation) can break on code tags within (at least) HTML code.
B2: CodeRay (and [significantly more] Rouge) highlighting (within the current Redmine implementation) can break on ampersands within (at least) Ruby code.

C1: Rouge has a lot (more) issues highlighting Ruby/HTML/JS/CSS/Diff code correctly/sufficiently.
C2: Rouge misses some more essential Diff highlighting features.
C3: Rouge misses support for inline CSS/JS highlighting within HTML code.

D1: Overall I think that CodeRay performs better judging by the quality and features of the provided highlighting of the tested languages.
D2: Rouge does indeed provide support for way more languages than CodeRay does (and I really like that), but if the same kind of issues are among those languages too, I think we'd just be making a bad trade-of between quantity over quality if we switch away from CodeRay (at least at the moment).

As the observant reader may have already noticed above, I am a bit disappointed by the overall quality of the highlighting of the Rouge library in comparison to CodeRay. If you'd ask me whether or not to replace CodeRay with Rouge, I'd vote against such a switch now.
I do however think that a lot of people are willing to make this trade-of considering the amount of languages supported by Rouge. As such could it be a good idea to build upon the changes from #2985 and integrate additional support for the Rouge highlighter within the core while keeping CodeRay as the default (with a front-end UI in the form of a setting; just like done for the Redmine Text Formatting and as like was originally proposed by Jean-Baptiste in #2985). That shouldn't be all too difficult to get done and provides the users an easy way to make the choice for themselfs without having to rely on a third-party plugin.


Some additional comments:

Go MEADA wrote:

Citation from Rouge's README.md :

Advantages to CodeRay
  • The HTML output from Rouge is fully compatible with stylesheets designed for pygments.
  • The lexers are implemented with a dedicated DSL, rather than being hand-coded.
  • Rouge supports every language CodeRay does and more.

Regarding the first: I don't see how this is an advantage of Rouge over CodeRay for the end user. Regarding the second: I think this may actually well be a drawback being (one of) the cause(s) of a some or more of the Ruby highlighting issues observed above in the testfiles. Regarding the third: well, I see taskpaper (which indeed isn't a language, true ;). Besides that one, I really think that with syntax highlighting it is a case of doing it good, then it's useful. Doing it wrong, then it's just pretty-coloring code and as such a waste of processor-cycles. In my opinion quality should go way over quantity on this matter.

Go MAEDA wrote:

Marius BALTEANU wrote:

I tested the patch and I've the following observations:
  1. We should rename the "codeRay" variable in source:trunk/public/javascripts/jstoolbar/jstoolbar.js#L378
  2. Rename the "CodeRay" from source:trunk/public/stylesheets/rtl.css#L375
    [...]

Thanks for your feedback, I have fixed all of the above.

Ragarding the first: you seem to know how the minified js file is created as you seem able to patch it. I — and I'm sure some others too — are quite interested in the used process. Do you want to elaborate on that? Regarding the second: as far as I know the linenumbers styles became unused after r10131 and were removed from regular stylesheets with r14487. As such is patch 0004 obsolete and should it be replaced by a complete removal of both the comment and the actual style definition from rtl.css.

Some additional comments about the in-/output files/code:

Total files: 4 (22 unpacked):
  • One plain-text input document containing the content (of a wikipage) in textile: codehighlight.examples.txt;
  • Two pdf output documents containing the rendered content from codehighlight.examples.txt as a wikipage for both highlighters: codehighlight.examples.coderay.pdf and codehighlight.examples.rouge.pdf;
  • One zip-archive containing eighteen individual test code files: syntaxhl-testfiles.zip.

Because of the size limitations on redmine.org I have uploaded these four files to my old MediaFire-account (http://www.mediafire.com/evildev). I uploaded them to the subfolder 'rm24681' of the 'Redmine Miscellaneous' folder. A direct link to this shared folder is: https://www.mediafire.com/folder/9994htt2r7fdg/rm24681

Wrap-up:

If you have made it to here, yay. Seriously, apologies for the lenghty post! I thought let's start this new year by posting a nice detailed review... Best whishes for 2017 to all participants of this issue and beyond!

I'm interested in (further) feedback and willing to answer/participate in additional questions/discussions. Other experiences (eg. with other languages) from users using this patch or the plugin are very welcome if you ask me.

Kind regards, Mischa.

#15 Updated by Go MAEDA 10 months ago

A happy new year, Mischa. I am very impressed and deeply grateful for your deep inspection for Rouge and the patches.

As you wrote, I have to admit that there are some problems for now.

I do however think that a lot of people are willing to make this trade-of considering the amount of languages supported by Rouge. As such could it be a good idea to build upon the changes from #2985 and integrate additional support for the Rouge highlighter within the core while keeping CodeRay as the default (with a front-end UI in the form of a setting; just like done for the Redmine Text Formatting and as like was originally proposed by Jean-Baptiste in #2985).

How about using both CodeRay and Rouge? It means that syntax highlighting will be basically processed by CodeRay and fall back to Rouge if the language is not supported by CodeRay. With this method, we can increase supported languages while avoiding deteriorating the quality of syntax highlighting. In addition, it is possible to prevent increase of admin settings.

Ragarding the first: you seem to know how the minified js file is created as you seem able to patch it. I — and I'm sure some others too — are quite interested in the used process. Do you want to elaborate on that?

I just replaced the variable name with an editor.

Regarding the second: as far as I know the linenumbers styles became unused after r10131 and were removed from regular stylesheets with r14487. As such is patch 0004 obsolete and should it be replaced by a complete removal of both the comment and the actual style definition from rtl.css.

Thanks, I will fix it.

#16 Updated by Go MAEDA 10 months ago

Go MAEDA wrote:

How about using both CodeRay and Rouge? It means that syntax highlighting will be basically processed by CodeRay and fall back to Rouge if the language is not supported by CodeRay. With this method, we can increase supported languages while avoiding deteriorating the quality of syntax highlighting. In addition, it is possible to prevent increase of admin settings.

I have created a new patch with a new approach: 0001-Syntax-highlighter-fall-back-to-Rouge-if-the-languag.patch

  • Syntax highlighting is mainly done by CodeRay.
  • Rouge is used only when the given language is not supported by CodeRay.

We can increase supported language of syntax highlighting by this fallback mechanism while enjoying the high quality of CodeRay.

#17 Updated by Go MAEDA 10 months ago

  • Assignee set to Mischa The Evil

Mischa, could you review the new patch on #24681#note-16?

0001-Syntax-highlighter-fall-back-to-Rouge-if-the-languag.patch

I think it can make use of the good points of both CodeRay and Rouge.

#18 Updated by Mischa The Evil 9 months ago

Go MAEDA wrote:

Mischa, could you review the new patch on #24681#note-16?

I will, and I'll also try yet another approach building upon yours.
I have a quick question in advance though (which holds true for both patch approaches): unpatched highlight_by_filename seem to render unknown content (language == false) using ERB::Util.h and this behaviour seems to change in both your patches while highlight_by_language seems to gain this behaviour by setting lexer to ::Rouge::Lexers::PlainText when the content isn't recognized. Is this correct and wanted? Can you elaborate on that?

#19 Updated by Go MAEDA 9 months ago

Mischa The Evil wrote:

I have a quick question in advance though (which holds true for both patch approaches): unpatched highlight_by_filename seem to render unknown content (language == false) using ERB::Util.h and this behaviour seems to change in both your patches while highlight_by_language seems to gain this behaviour by setting lexer to ::Rouge::Lexers::PlainText when the content isn't recognized. Is this correct and wanted? Can you elaborate on that?

I wrote the code to avoid "RuntimeError: unknown lexer" exception. But I looked the original syntax_highlight.rb again and I realized that the exception will be catched in Redmine::SyntaxHighlighting.highlight_by_language and then ERB::Util.h(text) will be performed. So, I don't have to set lexer to ::Rouge::Lexers::PlainText.

Thanks for pointing it out.

#20 Updated by Go MAEDA 9 months ago

Mischa The Evil wrote:

I have a quick question in advance though (which holds true for both patch approaches): unpatched highlight_by_filename seem to render unknown content (language == false) using ERB::Util.h and this behaviour seems to change in both your patches while highlight_by_language seems to gain this behaviour by setting lexer to ::Rouge::Lexers::PlainText when the content isn't recognized. Is this correct and wanted? Can you elaborate on that?

I think it is no problem to use ::Rouge::Lexers::PlainText as a lexer because it does nothing and finally Rouge.highlight(text, ::Rouge::Lexers::PlainText, ::Rouge::Formatters::HTML) returns almost the same HTML with ERB::Util.h.

And patched version of highlight_by_filename also uses ::Rouge::Lexers::PlainText for unknown languages because ::Rouge::Lexer.guess_by_filename returns ::Rouge::Lexers::PlainText if Rouge cannot guess language.

#21 Updated by Go MAEDA 9 months ago

  • Status changed from Needs feedback to New
  • Target version changed from Candidate for next major release to 3.4.0

The patch 0001-Syntax-highlighter-fall-back-to-Rouge-if-the-languag.patch is ready to merge. I am sure that supporting over 100 languages brings great benefits to users. Let's include this feature in 3.4.0.

But I cannot write good English, so it is hard for me to update public/help/*/wiki_syntax_detailed_*.html. Could someone add explanation about Rouge to help files?

#22 Updated by Mischa The Evil 9 months ago

Go MAEDA wrote:

The patch 0001-Syntax-highlighter-fall-back-to-Rouge-if-the-languag.patch is ready to merge. [...] Let's include this feature in 3.4.0.

As I've said in note-18, I have been working on extensively reviewing this. My preliminary conclusion is that the here mentioned patch should not yet be committed as its approach has some (major) drawback(s) and the implementation as-is comes with a big performance drawback. I'll wrap-up my review (including some clear data supporting my previous performance statement) and create/wrap-up an alternative patch this weekend.

#23 Updated by Jean-Philippe Lang 9 months ago

  • Target version changed from 3.4.0 to Candidate for next major release

Thanks Mischa for reviewing this.

I didn't try the patches but it looks like CodeRay would still be used in some cases. Am I wrong? Because I'm not really in favor of keeping 2 code highlighters in the core.

#24 Updated by Hontvári Levente 8 months ago

Jean-Philippe, FYI: the conclusion of the discussion was that CodeRay's syntax highlighting seems to be significantly better quality for all 3 tested languages, but its list of supported languages is minuscule. That is why the combined approach was implemented in the final patch.

#25 Updated by Adrien Crivelli 4 months ago

This might be inappropriate, but have you considered re-using what GitHub uses ?

I'd say they should have a rather mature solution to support lots of language while keeping a good quality. It seems to live there: https://github.com/github/linguist

#26 Updated by Kornelius Kalnbach about 1 month ago

For what it's worth, I think the idea of using CodeRay with a Rouge fallback is awesome. If you can make it work (the HTML output and formatting is pretty much incompatible), it would be the best of both worlds. I would also propose that for some languages, Rouge support is better (eg. Java and PHP).

As the maintainer of CodeRay, I can assure you that it will never support 100+ languages - at least not in its current form (version 2.0 is vaporware so far). Rouge is actively being developed, and support for JavaScript ES6, Swift, or other nice modern languages will probably never come to CodeRay.

@Mischa: Wow, awesome analysis! It may be a bit unfair to use CodeRay's test suite to compare Rouge, but the quality vs. quantity aspect is the whole reason CodeRay exists. I think the ampersand issues are related to the Redmine integration, though.

@Adrien: Linguist is great, but it requires Python as far as I'm aware…which would make it a bad fit for Redmine, right?

@Jean-Philippe: If there would be a "coderouge" meta-gem, combining coderay and rouge, would you accept it as "1 code highlighter"? I can reach out to jneen, the maintainer of Rouge, and see if we can release such a thing. Might even be useful for other projects.

#27 Updated by Frank Church about 1 month ago

I am reading this with interest as I use a number of languages Coderay doesn't serve.

Is there a separate ready-made plugin available for rouge or does it require the patch in question?

Also available in: Atom PDF