views:

369

answers:

2

I need to convert HTML documents (generated from DocBook XML documents) to the Wiki mark up language, in particular to the PM Wiki mark up language. The goal is to include the company's application operations guides in our newly created wiki. This means that I actually have two options:

  1. Convert the HTMLs (generated from DocBook XMLs) to wiki
  2. Convert the Docbook XMLs directly to wiki

Since the HTMLs are generated by a DocBook to HTML converter, the way the tags are defined within the HTML documents do not vary much, only the contents of the documents.

I am looking for a solution that could be implemented quickly by myself. I will have to do this conversion once and then every time new versions of the application operations guides are created.

Solutions that I've thought of so far:

  1. Convert HTML to wiki with a Perl or PHP script, based on regular expressions.
  2. Convert Docbook XMLs directly to wiki. Since it is XML, I could use Java for XML parsing. The risk here is that I am not familiar with the DocBooks XML format (as I am with HTML), so this make take some time to learn.

What approach would you choose for this work?

Update:

I just tried a PMWiki extension called ConvertHTML. It did not work well, because it does not convert HTML tags (e.g. is not converted as is left as in the wiki), as its documentation says:

PmWiki markup does not support all of the HTML markup so a 100% conversion is not possible. However, PmWiki can make replacements to the text as it is being edited or saved. ConvertHTML implements a relatively comprehensive set of rules for converting HTML tags to wiki markup.

+1  A: 

I used Digester to generate Java Objects out of an simple XML File and modify it for my needs via Java. It is an very simple to use Tool. Maybe you want to give it a try. Worked for me..

bastianneu
Digester is really cool if you are working with small xml files. But if the xml files are getting bigger, one should really use another parser (because digester is one of the slowest when it comes to big files [recognizable at a filesize of >5-10MB])
Thank you for that addition
bastianneu
That's interesting. But I would then need to generate HTML from the Java object. I don't think it would be the solution of easiest implementation in this case.
Bruno Rothgiesser
+4  A: 

This might be useful, though it converts from DocBook to MediaWiki, not PM Wiki.

There are Perl modules which can convert HTML to various Wiki dialects: HTML::WikiConverter. So if you can get your DocBook into HTML, then that might also work.

uckelman
+1 for `HTML::WikiConverter`. It looks good. Whatever you do, don't parse HTML using regular expressions. ;-)
Sinan Ünür
HTML::WikiConverter seems to be what I need. I'll give it a try today. The PMWiki dialect that I want is supported: http://search.cpan.org/~diberri/HTML-WikiConverter-PmWiki-0.51/lib/HTML/WikiConverter/PmWiki.pm
Bruno Rothgiesser
HTML::WikiConverter worked well. It was not a perfect conversion, but was the the best solution that I found so far.
Bruno Rothgiesser