



I am looking for a HTML to wiki website translator. Basically I want to publish the coverage reports generated by cobertura to my google code website. But google code only suuports wiki pages, so if someone can point me to a HTML website to wiki pages (linked together) translator I can publish my coverage reports.

+3  A: 

There is a pretty good translator available here. It also supports the google code wiki syntax.

See if this can help you out.

@Tingu Thanks. I will check this out.
Faisal Feroz
@Tingu Well its a pretty good translator but it can only translate one page at a time. As I said I am looking for a translator/converter which can take a complete website and convert it to wiki syntax where all the pages are linked together. Thanks anyways.
Faisal Feroz
I am accepting this answer as there are no other good answers available and awarding the bounty as well.
Faisal Feroz
+2  A: 

I'm not familiar with any such translators, but it wouldn't be difficult for you to hack up a quick wiki markup DOM seralizer on your own as a last resort.

Just write a function to parse the HTML using a DOM parser (My favorite is the LXML Python binding for libxml2) and serialize to wiki markup via depth-first traversal and then wrap the whole thing in a ready-made spidering framework. (Or whip your own up. That's not too difficult either.)

Something like this Python code: (Using StackOverflow markup as the example)

tags = {
    'b'       : {'start': '**', 'end': '**'},
    'em'      : {'start': '*', 'end': '*'},
    'i'       : {'start': '*', 'end': '*'},
    'strong'  : {'start': '**', 'end': '**'},
    // etc.

def serialize(node):
    tag = tags.get(node.tag, {})

    return ''.join([tag.get('start', ''), node.text or ''] +
                   [serialize(child) for child in node] +
                   [tag.get('end', ''), node.tail or ''])

wiki_markup = serialize(domRoot)

That took me maybe 5 minutes and I could probably implement the whole thing in under an hour.

I left out the more complicated bits for handling block markup (stuff where newlines, indentation, or line-starting characters are significant) and footnote-style link definition, but that's not very difficult... especially if you add an optional callback argument to the tag definition structure.

Really, the only time-consuming part is reinventing the Makefile-style "only update what's been changed" caching.
