views:

2373

answers:

3

I'm interested in selectively parsing Mediawiki XML markup to generate a customized HTML page that's some subset of the HTML produced by the actual PHP Mediawiki render engine.

I want it for BzReader, an offline Mediawiki compressed dump reader written in C#. So a C# parser would be ideal, but any good code would help.

Of course, if no one has done it before, I guess it's time to start a project maintaining a free and separate Mediawiki parser, based on Mediawiki's own parser, but less tightly integrated with Mediawiki itself.

So, does anyone know of any base I could begin with, that would be better than hacking from the Mediawiki PHP code?

+6  A: 

There is a list of parsers on http://www.mediawiki.org/wiki/Alternative_parsers, but a c# parser is not included there...

Wimmel
For .net integration, he could use iron python though.
Dana the Sane
+1  A: 

I had some words to say about Mediawiki templates here. Interesting that there's a list of alternative parsers now, I'll have to investigate that.

Greg Hewgill
+2  A: 

Screwturn is released under the GPL license, and has a C# (it's written in C#) mediawiki parser:

The class you are after is Core.Formatter which has lots of regexs to do its work:

public static class Formatter {

}

It's not the nicest looking code "but it works".

Chris S
Thanks! Excellent resource.
Asaf Bartov