Feature #22923

Export Wiki to ODT

Added by Gregor Schmidt over 1 year ago. Updated about 1 year ago.

Status:NewStart date:
Priority:NormalDue date:
Assignee:-% Done:

90%

Category:Wiki
Target version:-
Resolution:

Description

Attached you may find a patch, which adds Wiki to ODT export capabilities to Redmine.

The export works similar to the PDF export in that it mainly passes the generated HTML export to another library. In case of ODT, this is the html2odt gem, which in turn is based on xhtml2odt, which uses XSLT to transform HTML to OpenDocument compatible XML.

The Redmine integration mainly consists of code to handle image paths and a bit of clean up before passing the HTML to the library.


This change was implemented for a Planio customer, who wanted to use the Wiki to create simple templates, that users should then fill out using MS Word (and friends). We chose to export ODT (instead of DOCX e.g.) since the OpenDocument format is an open standard, the tool support was better and since it is supported by a wide range of word processing applications (MS Word 2010 and later, LibreOffice, OpenOffice, AbiWord, Pages).

Please note, that this feature benefits from the change proposed in #22898. Otherwise aligned images within a paragraph cause errors in the export, i.e. the following paragraph is missing from the ODT.

HowTo_Install_Redmine_212_in_Ubuntu_1210_and_Apache_Passenger.txt Magnifier (4.3 KB) Jean-Philippe Lang, 2016-06-06 12:29

HowTo_Install_Redmine_212_in_Ubuntu_1210_and_Apache_Passenger.odt (9.61 KB) Jean-Philippe Lang, 2016-06-06 12:29

Wiki.txt Magnifier - Updated HowTo_Install_Redmine_212_in_Ubuntu_1210_and_Apache_Passenger.txt (4.33 KB) Gregor Schmidt, 2016-06-06 16:18

redmine.odt (587 KB) Jean-Philippe Lang, 2016-06-12 12:04

0001-Add-ODT-export-for-wiki-pages.patch Magnifier (10.8 KB) Gregor Schmidt, 2016-06-24 14:20


Related issues

Duplicated by Redmine - Feature #16324: Wiki export as docx file. New
Blocked by Redmine - Patch #22898: !>image.png! generates invalid HTML Closed

History

#1 Updated by Jan from Planio www.plan.io over 1 year ago

  • Target version set to Candidate for next major release
  • % Done changed from 0 to 90

#2 Updated by Jan from Planio www.plan.io over 1 year ago

  • Blocked by Patch #22898: !>image.png! generates invalid HTML added

#3 Updated by Go MAEDA over 1 year ago

#4 Updated by Jan from Planio www.plan.io over 1 year ago

#5 Updated by Jan from Planio www.plan.io over 1 year ago

#6 Updated by Go MAEDA over 1 year ago

  • Target version changed from Candidate for next major release to 3.3.0

I tested this patch with LibreOffice Mac and Word 2016 for Mac, works fine. I think it is very useful to write documents based on wiki contents.

@Jan-Philippe, could this be included in 3.3.0?

#7 Updated by Jean-Philippe Lang over 1 year ago

I tested the patch on a copy of the Redmine wiki:

  • exporting the whole wiki doesn't respond/is too slow (had to kill ruby after 15 minutes), we should probably disable this option. In comparaison, exporting every pages individually took about 2 minutes.
  • some pages generate documents that seem invalid (at least for Word 2007 that complains about an unspecified error when opening the generated odt)
  • some pages that contain URLs trigger this kind of error (seems to be caused by examples URLs that are not valid URLs):
ActionView::Template::Error (the scheme http does not accept registry part: proxy.domain.tld:port (or bad hostname?)):
    1: <%= raw wiki_page_to_odt(@page, @project) %>
  c:/utils/ruby/lib/ruby/2.0.0/uri/generic.rb:1203:in `rescue in merge'
  c:/utils/ruby/lib/ruby/2.0.0/uri/generic.rb:1200:in `merge'
  html2odt (0.3.0) lib/html2odt/document.rb:270:in `block in fix_links'
  nokogiri-1.6.7.2-x86 (mingw32) lib/nokogiri/xml/node_set.rb:187:in `block in each'
  nokogiri-1.6.7.2-x86 (mingw32) lib/nokogiri/xml/node_set.rb:186:in `upto'
  nokogiri-1.6.7.2-x86 (mingw32) lib/nokogiri/xml/node_set.rb:186:in `each'
  html2odt (0.3.0) lib/html2odt/document.rb:269:in `fix_links'
  html2odt (0.3.0) lib/html2odt/document.rb:209:in `prepare_html'
  html2odt (0.3.0) lib/html2odt/document.rb:55:in `content_xml'
  html2odt (0.3.0) lib/html2odt/document.rb:142:in `block (4 levels) in data'
  rubyzip (1.2.0) lib/zip/entry.rb:495:in `get_input_stream'
  html2odt (0.3.0) lib/html2odt/document.rb:139:in `block (3 levels) in data'

I attached a raw wiki page content. When formatting is set to Markdown, you should get the error, when formatting is set to textile, you should get the invalid odt (which is also attached).

#8 Updated by Jan from Planio www.plan.io over 1 year ago

I tested the patch on a copy of the Redmine wiki:

Thank you for testing it. The long response times seem quite odd. It works quite fast (also with larger wikis) for us. For comparison, you could try on a new Planio account, we already have this running in production.

We hadn't tested it on Windows though which it seems you have used for your tests, correct?

We will look into it and try to reproduce/fix these problems. Thanks again!

#9 Updated by Jean-Philippe Lang over 1 year ago

Jan from Planio www.plan.io wrote:

We hadn't tested it on Windows though which it seems you have used for your tests, correct?

Correct, ruby 2.0.0p481 (2014-05-08) [i386-mingw32]

#10 Updated by Gregor Schmidt over 1 year ago

Thanks a lot for testing this patch/feature.

I have just set up a Windows development machine to verify your feedback:

  • exporting the whole wiki doesn't respond/is too slow (had to kill ruby after 15 minutes), we should probably disable this option. In comparaison, exporting every pages individually took about 2 minutes.

This is very surprising. On my newly set up development virtual machine (on 5 year old Macbook) running Windows 7 and Ruby 2.0 (using Ruby Installer), the example wiki page you provided was generated within 0.4 seconds. This also matches my experience on Mac OS and Linux.

  • some pages generate documents that seem invalid (at least for Word 2007 that complains about an unspecified error when opening the generated odt)

Unfortunately this is true. xhtml2odt claims to handle only valid XHTML and we have already encountered cases where even valid HTML was not handled in the expected way. #22898 was one such case, the given example document is another. Attached you may find an updated one. All I did, was added new lines around the pre tags, so that they are not entangled with the previous paragraph. But since we cannot expect the user to write perfectly formatted content, I will have a look at how I can fix the problem at hand in any case. I just wanted to let you know about the underlying reason.

  • some pages that contain URLs trigger this kind of error (seems to be caused by examples URLs that are not valid URLs)

Thank you for this bug report. Indeed, handling of invalid URIs was missing in html2odt. The latest release (0.3.1) fixes that. Running bundle update html2odt should add it to your installation.


I will have a look, at how I can handle the markup generated on your example content.

Could you try to narrow down, why generating the ODT is so slow on your machine? Should we gather more feedback by other developers? What do you think?

#11 Updated by Gregor Schmidt over 1 year ago

We have just released html2odt v0.3.3, which addresses all problems, that came up during your tests.

The only problem remaining would be the speed issues, you saw. But I am afraid, that I cannot isolate and reproduce those without further input.

#12 Updated by Jean-Philippe Lang over 1 year ago

  • File redmine.odt added
  • Target version deleted (3.3.0)

Gregor Schmidt wrote:

This is very surprising. On my newly set up development virtual machine (on 5 year old Macbook) running Windows 7 and Ruby 2.0 (using Ruby Installer), the example wiki page you provided was generated within 0.4 seconds. This also matches my experience on Mac OS and Linux.

Yes, I had good response times when exporting signle wiki pages, as I mentioned before. The problem was when exporting the whole wiki in one ODT file (by using the export link on the wiki page index). This was not a Windows issue as I get the same behaviour when testing under linux (I let it run for more than one hour before killing webrick).

This problem no longer occurs with the same wiki content and html2odt 0.3.3 so I guess it was a html2odt or nokogiri issue (html2odt 0.3.3 uses a different nokogiri version). Now I get the whole wiki export in a minute (with both windows and linux) but the resulting ODT seems to be invalid. You'll find it attached.

This feature seems to be usefull for a very few users and I prefer not to add it to the core. But I'd be happy to refactor a few things in order to make it easier for plugins to add new export formats, without having to patch views and controllers.

#13 Updated by Gregor Schmidt over 1 year ago

Thanks again for taking the time to review and test the changes and for giving such detailed feedback. This is very much appreciated.

Jean-Philippe Lang wrote:

This problem no longer occurs with the same wiki content and html2odt 0.3.3 so I guess it was a html2odt or nokogiri issue (html2odt 0.3.3 uses a different nokogiri version). Now I get the whole wiki export in a minute (with both windows and linux) but the resulting ODT seems to be invalid. You'll find it attached.

Thanks for the feedback. I am not aware of speed related improvements in html2odt - at least I was not aiming for them. But I am glad, that it's working better now.

Thanks for providing the ODT. I'll have a look and see if I can isolate the root cause of the error.

This feature seems to be usefull for a very few users and I prefer not to add it to the core.

I am sorry to hear that, but I can totally understand your decision.

But I'd be happy to refactor a few things in order to make it easier for plugins to add new export formats, without having to patch views and controllers.

That would be great. We would be happy to create a plugin with the same features. If Redmine had some kind of export registry, we could easily hook into that.

We are looking forward to those changes. Let me know if you would like to us to help with those refactorings.

#14 Updated by Gregor Schmidt over 1 year ago

  • File 0001-Add-ODT-export-for-wiki-pages.patch added

Thanks again for providing the erroneous ODT. Now that I see, that it is more than 800 pages long, I can imagine, why it's taking a minute to generate. I assume, the PDF export would not be a lot faster.

Concerning the ODT error, I was able to identify the root cause. It is related to the HTML generated by the collapse macro. The error was located in the export of this wiki page.

Attached you may find an updated patch1, which further cleans up the HTML before handing it over to html2odt. I am leaving this here for future reference.

1 replacing the first one

#15 Updated by Jan from Planio www.plan.io over 1 year ago

  • File deleted (0001-Add-ODT-export-for-wiki-pages.patch)

#16 Updated by Gregor Schmidt over 1 year ago

Previous versions of the patch contained a path traversal vulnerability, which allowed attackers to access image files outside of Rails.public_path. Attached you may find an updated patch, including the fix.

#17 Updated by Jan from Planio www.plan.io over 1 year ago

  • File deleted (0001-Add-ODT-export-for-wiki-pages.patch)

#18 Updated by Antonín Heřmánek about 1 year ago

Wiki pages are converted without images. Runing on Debian stable, ruby 2.1.5, redmine 3.1.1

Also available in: Atom PDF