Defect #35036

Markdown text sections broken by thematic breaks (horizontal rules)

Added by Martin Cizek 11 days ago. Updated 11 days ago.

Status:NewStart date:
Priority:NormalDue date:
Assignee:-% Done:

0%

Category:Text formatting
Target version:-
Resolution: Affected version:4.2.0

Description

A thematic break composed of hyphens (e.g. "---") breaks the division of Markdown text into individually editable sections.

Steps to reproduce

Configure markdown text formatting, create a Wiki page in a web browser and enter the following content:

# Title
## Heading 2
Preceding CRLF is the default for web-submitted data.

---

End of thematic breaks.

## Heading 2
Nulla nunc nisi, egestas in ornare vel, posuere ac libero.

More in the unit tests in the enclosed patch.

Cause

The reason is that it is confused with a setext heading. Although the current regexp in extract_sections actually tries to restrict setext headings in a way that it must follow a non-empty line, it does not account for a whitespace-only line or even plain CRLF. And as long as the text originates from a web browser, there is always a CRLF. So the problem is pretty common even for carefully formated text.

Fix

Attaching a patch with a fix and corresponding unit tests.

Broader context

The current approach to section extraction is inherently fragile - as shown in the other (skipped) unit test enclosed in the patch. I'd suggest to keep the skipped test there to mark it as a known issue. Will create a dedicated issue for this.

0001-markdown_formatter-extract_sections-fix.patch Magnifier - extract_sections fix for dash thematic breaks + unit tests (2.76 KB) Martin Cizek, 2021-04-05 16:05

History

#2 Updated by Martin Cizek 11 days ago

Just to make it clear - the skipped test address an already existing error, which is probably not worth fixing with the current approach in extract_sections implementations. See #35037 for more details.

But the particular and common error in thematic break mistreatment is solved by the patch "as is", it can be just applied. :)

Also available in: Atom PDF