tags:

views:

215

answers:

3

I'd like to get the HTML for a MediaWiki Page, that is I want to run the MediaWiki Markup through the parser. Now, I know I could just use some external Parser, but most of them do not support Transclusion and (naturally) Extensions, so my output will be different.

As I have access to the MediaWiki installation, I wonder if I can just use the built-in parser to render me the page. I don't want to do screen scraping because of all the other stuff on the page (navigation, sidebar, javscript and css includes etc.), I literally just want the body.

If it matters, it is running MediaWiki 1.12 on PHP 5.2.

+1  A: 

Yes you can do that, as a matter of fact, I remember doing this very thing in many of my extensions available here.

Found one of my extension that does this: SecureTransclusion.

snippet follows:

public function mg_strans( &$parser, $page, $errorMessage = null, $timeout = 5 ) {

 if (!self::checkExecuteRight( $parser->mTitle ))
  return 'SecureTransclusion: '.wfMsg('badaccess');

 $title = Title::newFromText( $page );
 if (!is_object( $title ))
  return 'SecureTransclusion: '.wfMsg('badtitle')." ($page)";

 if ( $title->isTrans() )
  $content = $this->getRemotePage( $parser, $title, $errorMessage, $timeout );
 else
  $content = $this->getLocalPage( $title, $errorMessage );

 $po = $parser->parse( $content, $parser->mTitle, new ParserOptions() );
 $html = $po->getText();

 return array( $html, 'noparse' => true, 'isHTML' => true );
}
jldupont
Thanks, that should help! I assume that $page is the title of the Page and not the WikiText?
Michael Stum
A: 

How about using the current MediaWiki parser? Just grab the converted output, say

from <!-- start content --> to either <div class="printfooter">

or NewPP limit report. The latter begins the preprocessor's statistics. That way all the side frames and banners are omitted.

wallyk
That requires the design to be constant, so every design change may break the parser by removing the strings the parser looks for.
Michael Stum
True enough, but it's up to the implementer to decide if that's an acceptable tradeoff. I just think of the ideas; it's up to others to decide whether to use them—or modify further as suitable for the problem.
wallyk
+3  A: 

Use action=render; eg index.php?title=Article_title&action=render

Bryan
Shiny! That actually works better as it's run through a crawler...
Michael Stum