Patch #32424

CommonMark Markdown Text Formatting

Added by Jens Krämer about 1 month ago. Updated about 20 hours ago.

Status:NewStart date:
Priority:NormalDue date:
Assignee:-% Done:

0%

Category:Text formatting
Target version:4.2.0

Description

This patch introduces a new text formatting named CommonMark Markdown (GitHub Flavored). It is based on CommonMarker and HTMLPipeline. The formatter was extracted from Planio where it will soon become the default for new accounts.

We built this instead of going with the existing RedCarpet Markdown implementation for a number of reasons:

  • From time to time users who are using the current Markdown formatter ask for a spec / formal list of all supported features. No such thing exists for RedCarpet. There is CommonMark but RedCarpet isn't going to support it in the short to medium term (see next point).
  • The future development of RedCarpet is uncertain. Few excerpts from a GitHub issue , a year ago:

Commonmark won't be supported anytime soon

and

A general message about the project for people skimming this thread: I'm sorry Redcarpet isn't really active anymore but my resources are pretty limited since I'm committed to other open source projects and still a student. Feel free to use any existing alternative ; this project isn't the great tool it used to be when it was maintained by Vicent.

  • With CommonMark evolving as a Markdown spec that is supported by many implementations and endorsed by organizations like Gitlab and GitHub (which both did the switch from RedCarpet to CommonMarker a while ago), it quickly becomes what users expect when they hear 'Markdown'.
  • Migrating existing Textile content is a bit easier since Pandoc has a dedicated Github Flavored Markdown writer module.
  • The HTML pipeline approach encourages splitting up the formatting process into it's different aspects (html generation, sanitizing, syntax highlighting etc) which allows for better testability and has potential for future re-use of components with other text formatters. Further, HTML pipeline filters work on actual DOM nodes, making tasks like adding classes to links etc much more straight forward and less prone to bugs than doing so with regular expressions.

Last but not least, this formatter solves a number of currently open issues regarding the RedCarpet based Markdown Formatter:

  • #19880 (Incorrect syntax for links in Markdown)
  • #20497 (Markdown formatting supporting HTML)
  • #20841 (Bare URLs in Markdown don't have "external" class)
  • #29172 (Markdown: External links broken)

The main reason why we want to introduce this as a third formatting option and not just replace the existing Markdown formatter is line endings - this formatter does not insert hard breaks (<br>) for simple newlines, but requires either a \ or two spaces at the end of the line to do so. Over time the new formatter could become the default and the RedCarpet based one might be renamed to Markdown (legacy) or similar.

English help files are included, and the patch makes sure these are delivered for any other languages as well until corresponding localized versions are available.

0001-CommonMark-Markdown-text-formatter.patch Magnifier (68.5 KB) Jens Krämer, 2019-11-06 09:27

0002-attachments_helper-commonmark.patch Magnifier - Use common_mark format for rendering Markdown attachments (725 Bytes) Martin Cizek, 2019-12-08 23:42


Related issues

Related to Redmine - Defect #29172: Markdown: External links broken New
Related to Redmine - Defect #20841: Bare URLs in Markdown don't have "external" class New
Related to Redmine - Feature #20497: Markdown formatting supporting HTML New
Related to Redmine - Defect #19880: Incorrect syntax for links in Markdown New
Related to Redmine - Defect #32563: Redmine 4 crashing with SEGFAULT under stress test when M... New

History

#1 Updated by Marius BALTEANU about 1 month ago

I really, really like the proposed patch, especially for the HTML Pipeline gem that can help us a lot to improve the current code.

#2 Updated by Jan from Planio www.plan.io about 1 month ago

  • Related to Defect #29172: Markdown: External links broken added

#3 Updated by Jan from Planio www.plan.io about 1 month ago

  • Related to Defect #20841: Bare URLs in Markdown don't have "external" class added

#4 Updated by Jan from Planio www.plan.io about 1 month ago

  • Related to Feature #20497: Markdown formatting supporting HTML added

#5 Updated by Jan from Planio www.plan.io about 1 month ago

  • Related to Defect #19880: Incorrect syntax for links in Markdown added

#6 Updated by Go MAEDA 17 days ago

  • Target version set to Candidate for next major release

#7 Updated by Go MAEDA 6 days ago

  • Target version changed from Candidate for next major release to 4.2.0

I agree that Redcarpet is not active anymore. The number of commits made in this year is only 3. And I think many Markdown users expect Redmine to behave as CommonMark/GFM compliant because many apps/services that support it.

Let's start discussion to deliver this in 4.2.0.

#8 Updated by Go MAEDA 5 days ago

  • Related to Defect #32563: Redmine 4 crashing with SEGFAULT under stress test when Markdown is used added

#9 Updated by Martin Cizek 4 days ago

Love you guys! We are playing with markup format conversions and preparing bulletproof arguments to integrate commonmarker + HTML sanitization for almost half a year. :) Jens did this job perfectly and the only unused argument regarding Redcarpet was #32563. I even tried to make a bulletproof pull request to solve an unpleasant Redcarpet issue observed in our Textile->Markdown migration (since 2013) to test if the project is really dead. It is.

Just a very small improvement is attached.

I'd sign every word written by Jens and would like to share a few remarks in subsequent comments, hope they help.

Go MAEDA, would you mind adding #22323 to related issues regarding the line breaks?

#10 Updated by Martin Cizek 4 days ago

Temporary workaround before the patch is merged

At the moment, it is possible to use redmine_common_mark plugin. It also allows for configuring commonmarker, but it has no HTML sanitizing implemented. Actually I found this patch when I was about to offer a pull request with html-pipeline to the plugin's author.

We’d still appreciate merging this patch ASAP.

#11 Updated by Martin Cizek 4 days ago

commonmarker configuration

These are the options used by GitLab. The options in the patch are the same (thumbs up!).

GitHub's configs have a few differences (it does not mean that we want them):
  • STRIKETHROUGH_DOUBLE_TILDE is not used in repos.
  • It uses HARDBREAKS in their issues, which is inconsistent with their wiki and repository rendering.
  • They have the tasklist extension enabled. I'd consider enabling it in Redmine, as tasklits is an officially documented GFM feature.

I can imagine that some Redmine users would like other options than us, as we have seen before in Redmine. And it would be quite legit e.g. for the tasklists mentioned above.

Making commonmarker configurable can mean way too many config options, which is difficult to support. A good compromise might be to make a hook for plugins, so that they can change the config.

I can create a follow-up ticket after getting some feedback.

#12 Updated by Martin Cizek 4 days ago

HTML sanitizing

HTML sanitizing is currently embedded in the commonmark formatter in the patch. That's a good start.

As #807 suggests, sanitizing should rather be a shared concept, eventually with small differences for different formats. GitLab did it this way, their base_sanitization_filter.rb contains common sanitization setup and CommonMark sanitization_filter.rb customizes the HTML::Pipeline::SanitizationFilter on top of that.

I can create a follow-up ticket after getting some feedback.

#13 Updated by Martin Cizek 4 days ago

Textile to Markdown migration

Pandoc is actually bad at this job. We tried Jens'es redmine_convert_textile_to_markdown (thanks for that!), we ran rendering comparison tests1 covering hundreds of thousands of strings and the results were poor.

1 Rendering comparison test = grab all rendered issues and wiki pages from the Textile Redmine instance and a converted-to-Markdown Redmine instance, normalize HTML, compare the HTMLs.

So we forked it, and similarly to Jens, we were adding more and more preprocessors to make Pandoc happy and postprocessors to render it correctly. This is the latest version of the fork. Later, we reworked it completely to a new project, which we'll publish soon.

But if we were doing it again, we would get rid of Pandoc completely. The preprocessing is done by partial rendering using code adapted from Redmine / Redcloth3. The amount of the code is comparable to normal rendering and invoking Pandoc just makes it slow.

Pandoc is a great tool, but this use case is just too specific for a universal format converter.

So the message is just: a good converter exists for Redmine. :)

#14 Updated by Jens Krämer 3 days ago

Hi, thanks for the feedback!

Nice catch about the rendering of Markdown files, that indeed should be switched to commonmark as well. Also great work on the Migrator :)

General use of HTML Pipeline

It would really make sense to use HTML pipeline for Textile rendering as well, as it will open up a lot of options for refactoring and sharing code (i.e. for HTML sanitization) between the different pipelines via filters. I intentionally stopped before doing this to gauge interest and get commonmark in as quickly as possible. We can (and imho should) rework the Textile rendering at a later point, as well as do some cleanup by moving rendering related code from helpers to the pipeline. The same could be done to the Redcarpet based markdown formatter if we decide to still keep that around at this time.

Configuration

For the sake of not overcomplicating this already complex patch I went with what I think are sensible defaults. I have no idea why Github is so inconsistent with their line break rendering and would definitely stay away from doing the same.

Introducing a way for configuring the rendering pipeline through plugins makes sense, maybe as part of the refactoring when we move over all formatters to the pipeline?

Regarding the task lists - to be honest I left this out since at Planio this would collide with the checklists plugin we have as a standard feature. Also this would require additional code to handle the checking / unchecking of the checkboxes in the rendered text (everywhere markdown is rendered) again making the patch more complex. I feel like this also really only makes sense in issue descriptions and (maybe) wiki pages. So this would be a case where we need slightly different pipeline config depending on the context.

#15 Updated by Hans Riekehof 1 day ago

Oh my this patch would be so nice. I know its always hard to say but is there any roadmap when this patch is available in an official release ?

#16 Updated by Mischa The Evil 1 day ago

Hans Riekehof wrote in #note-15:

[...] I know its always hard to say but is there any roadmap when this patch is available in an official release ?

Given all the parameters (size/scope/state/complexity of the patch, current Redmine release cycle, current issue scheduling) I'd say not earlier than before the end of 2020. However, this does not mean that it won't be available on the Redmine trunk earlier.

NB: Please keep this issue clean, focused, on-topic and free from superfluous comments...

#17 Updated by Go MAEDA about 20 hours ago

Thank you for posting the patch.

I think we should remove Redcarpet instead of adding the second Markdown formatter. Although I understand that CommonMaker is not fully compatible with Redcarpet, having two different Markdown formatter introduces some problems.

  • It is confusing. Maybe some users cannot understand why Redmine has two Markdown formatter and cannot determine which they should use
  • Consumes more memory

Someday after adding support for CommonMaker, we need to remove Redcarpet. In my opinion, when the time we add CommonMaker is the best chance to remove Redcarpet. Replacing Redcarpet with CommonMaker is a simple and understandable story for everyone.

Also available in: Atom PDF