CommonMark Markdown Text Formatting
This patch introduces a new text formatting named CommonMark Markdown (GitHub Flavored). It is based on CommonMarker and HTMLPipeline. The formatter was extracted from Planio where it will soon become the default for new accounts.
We built this instead of going with the existing RedCarpet Markdown implementation for a number of reasons:
- From time to time users who are using the current Markdown formatter ask for a spec / formal list of all supported features. No such thing exists for RedCarpet. There is CommonMark but RedCarpet isn't going to support it in the short to medium term (see next point).
- The future development of RedCarpet is uncertain. Few excerpts from a GitHub issue , a year ago:
Commonmark won't be supported anytime soon
A general message about the project for people skimming this thread: I'm sorry Redcarpet isn't really active anymore but my resources are pretty limited since I'm committed to other open source projects and still a student. Feel free to use any existing alternative ; this project isn't the great tool it used to be when it was maintained by Vicent.
- With CommonMark evolving as a Markdown spec that is supported by many implementations and endorsed by organizations like Gitlab and GitHub (which both did the switch from RedCarpet to CommonMarker a while ago), it quickly becomes what users expect when they hear 'Markdown'.
- Migrating existing Textile content is a bit easier since Pandoc has a dedicated Github Flavored Markdown writer module.
- The HTML pipeline approach encourages splitting up the formatting process into it's different aspects (html generation, sanitizing, syntax highlighting etc) which allows for better testability and has potential for future re-use of components with other text formatters. Further, HTML pipeline filters work on actual DOM nodes, making tasks like adding classes to links etc much more straight forward and less prone to bugs than doing so with regular expressions.
Last but not least, this formatter solves a number of currently open issues regarding the RedCarpet based Markdown Formatter:
- #19880 (Incorrect syntax for links in Markdown)
- #20497 (Markdown formatting supporting HTML)
- #20841 (Bare URLs in Markdown don't have "external" class)
- #29172 (Markdown: External links broken)
The main reason why we want to introduce this as a third formatting option and not just replace the existing Markdown formatter is line endings - this formatter does not insert hard breaks (
<br>) for simple newlines, but requires either a
\ or two spaces at the end of the line to do so. Over time the new formatter could become the default and the RedCarpet based one might be renamed to Markdown (legacy) or similar.
English help files are included, and the patch makes sure these are delivered for any other languages as well until corresponding localized versions are available.
- Target version changed from Candidate for next major release to 4.2.0
I agree that Redcarpet is not active anymore. The number of commits made in this year is only 3. And I think many Markdown users expect Redmine to behave as CommonMark/GFM compliant because many apps/services that support it.
Let's start discussion to deliver this in 4.2.0.
#9 Updated by Martin Cizek 4 days ago
- File 0002-attachments_helper-commonmark.patch added
Love you guys! We are playing with markup format conversions and preparing bulletproof arguments to integrate commonmarker + HTML sanitization for almost half a year. :) Jens did this job perfectly and the only unused argument regarding Redcarpet was #32563. I even tried to make a bulletproof pull request to solve an unpleasant Redcarpet issue observed in our Textile->Markdown migration (since 2013) to test if the project is really dead. It is.
Just a very small improvement is attached.
I'd sign every word written by Jens and would like to share a few remarks in subsequent comments, hope they help.
Go MAEDA, would you mind adding #22323 to related issues regarding the line breaks?
#10 Updated by Martin Cizek 4 days ago
Temporary workaround before the patch is merged¶
At the moment, it is possible to use redmine_common_mark plugin. It also allows for configuring commonmarker, but it has no HTML sanitizing implemented. Actually I found this patch when I was about to offer a pull request with
html-pipeline to the plugin's author.
We’d still appreciate merging this patch ASAP.
#11 Updated by Martin Cizek 4 days ago
These are the options used by GitLab. The options in the patch are the same (thumbs up!).GitHub's configs have a few differences (it does not mean that we want them):
STRIKETHROUGH_DOUBLE_TILDEis not used in repos.
- It uses
HARDBREAKSin their issues, which is inconsistent with their wiki and repository rendering.
- They have the
tasklistextension enabled. I'd consider enabling it in Redmine, as
tasklitsis an officially documented GFM feature.
I can imagine that some Redmine users would like other options than us, as we have seen before in Redmine. And it would be quite legit e.g. for the tasklists mentioned above.
Making commonmarker configurable can mean way too many config options, which is difficult to support. A good compromise might be to make a hook for plugins, so that they can change the config.
I can create a follow-up ticket after getting some feedback.
#12 Updated by Martin Cizek 4 days ago
HTML sanitizing is currently embedded in the commonmark formatter in the patch. That's a good start.
As #807 suggests, sanitizing should rather be a shared concept, eventually with small differences for different formats. GitLab did it this way, their base_sanitization_filter.rb contains common sanitization setup and CommonMark sanitization_filter.rb customizes the
HTML::Pipeline::SanitizationFilter on top of that.
I can create a follow-up ticket after getting some feedback.
#13 Updated by Martin Cizek 4 days ago
Textile to Markdown migration¶
Pandoc is actually bad at this job. We tried Jens'es redmine_convert_textile_to_markdown (thanks for that!), we ran rendering comparison tests1 covering hundreds of thousands of strings and the results were poor.
1 Rendering comparison test = grab all rendered issues and wiki pages from the Textile Redmine instance and a converted-to-Markdown Redmine instance, normalize HTML, compare the HTMLs.
So we forked it, and similarly to Jens, we were adding more and more preprocessors to make Pandoc happy and postprocessors to render it correctly. This is the
latest version of the fork. Later, we reworked it completely to a new project, which we'll publish soon.
But if we were doing it again, we would get rid of Pandoc completely. The preprocessing is done by partial rendering using code adapted from Redmine / Redcloth3. The amount of the code is comparable to normal rendering and invoking Pandoc just makes it slow.
Pandoc is a great tool, but this use case is just too specific for a universal format converter.
So the message is just: a good converter exists for Redmine. :)
#14 Updated by Jens Krämer 3 days ago
Hi, thanks for the feedback!
Nice catch about the rendering of Markdown files, that indeed should be switched to commonmark as well. Also great work on the Migrator :)
General use of HTML Pipeline
It would really make sense to use HTML pipeline for Textile rendering as well, as it will open up a lot of options for refactoring and sharing code (i.e. for HTML sanitization) between the different pipelines via filters. I intentionally stopped before doing this to gauge interest and get commonmark in as quickly as possible. We can (and imho should) rework the Textile rendering at a later point, as well as do some cleanup by moving rendering related code from helpers to the pipeline. The same could be done to the Redcarpet based markdown formatter if we decide to still keep that around at this time.
For the sake of not overcomplicating this already complex patch I went with what I think are sensible defaults. I have no idea why Github is so inconsistent with their line break rendering and would definitely stay away from doing the same.
Introducing a way for configuring the rendering pipeline through plugins makes sense, maybe as part of the refactoring when we move over all formatters to the pipeline?
Regarding the task lists - to be honest I left this out since at Planio this would collide with the checklists plugin we have as a standard feature. Also this would require additional code to handle the checking / unchecking of the checkboxes in the rendered text (everywhere markdown is rendered) again making the patch more complex. I feel like this also really only makes sense in issue descriptions and (maybe) wiki pages. So this would be a case where we need slightly different pipeline config depending on the context.
#16 Updated by Mischa The Evil 1 day ago
Hans Riekehof wrote in #note-15:
[...] I know its always hard to say but is there any roadmap when this patch is available in an official release ?
Given all the parameters (size/scope/state/complexity of the patch, current Redmine release cycle, current issue scheduling) I'd say not earlier than before the end of 2020. However, this does not mean that it won't be available on the Redmine trunk earlier.
NB: Please keep this issue clean, focused, on-topic and free from superfluous comments...
#17 Updated by Go MAEDA about 20 hours ago
Thank you for posting the patch.
I think we should remove Redcarpet instead of adding the second Markdown formatter. Although I understand that CommonMaker is not fully compatible with Redcarpet, having two different Markdown formatter introduces some problems.
- It is confusing. Maybe some users cannot understand why Redmine has two Markdown formatter and cannot determine which they should use
- Consumes more memory
Someday after adding support for CommonMaker, we need to remove Redcarpet. In my opinion, when the time we add CommonMaker is the best chance to remove Redcarpet. Replacing Redcarpet with CommonMaker is a simple and understandable story for everyone.