ansaurus

Question

Getting info from Wikipedia - how do I get HTML form?

Answer 1

A:

As far as I understand it, the Wikipedia software converts the Wiki markup into HTML when the page is requested. So using your current method, you'll need to deal with the results.

A good place to start is the Mediawiki API. You can also use http://pear.php.net/package/Text_Wiki to format the results retrieved via cURL.

Robert S. 2009-05-12 15:50:30

That link to Text_Wiki isn't working for me, something weird with the underscore?

Matt G 2009-05-25 11:18:01

I fixed it. :) Hope that works better.

Robert S. 2009-05-25 15:08:21

Answer 2

A:

Try looking at the printable version of the desired Wikipedia article in question.

In other words, change this line of your source code:

$url.=sprintf('&action=query&titles=%s&rvprop=content&prop=revisions&redirects=1', $article);

to something like:

$url.=sprintf('&action=query&titles=%s&printable=yes&redirects=1', $article);

Disclaimer: Have not tested, and this is just a guess at how your API might work.

HanClinto 2009-05-12 15:53:17

Answer 3

A:

There is a PEAR Wiki Filter that I have used and it does a very decent job.

Text Wiki

Phil

Phil Carter 2009-05-12 15:55:03

It probably won't render Wikipedia's myriad templates correctly, will it? (to do so, you'd either have to have copies of the templates locally, or it would have to fetch them from wikipedia)

Frank Farmer 2009-05-12 16:57:28

I know it will do the standard Wiki Mark Up, it's managed all the content I've ever put through it, so couldn't say with authority if it can do the templates or not. What the OP pasted was Wiki mark up and that will be converted.

Phil Carter 2009-05-12 21:49:21

What the OP pasted included "{{convert|9|km|mi|abbr=on}}", which is a template call.

Matt G 2009-05-25 11:15:44

Answer 4

+5 A:

The simplest solution would probably be to grab the page itself (e.g. http://en.wikipedia.org/wiki/Combination ) and then extract the content of <div id="content">, potentially with an xpath query.

Frank Farmer 2009-05-12 16:54:59

Nice idea - how would I do this I mean should I open a socket to the page? Also the issue is that I need to get portions of a page and sections as opposed to a full html dump of the content.

Ali 2009-05-13 08:54:15

ansaurus

tags:

views:

answers:

Getting info from Wikipedia - how do I get HTML form?

related questions