views:

719

answers:

2

Here's what I'm trying to do using the Wikipedia (MediaWiki) API - http://en.wikipedia.org/w/api.php

  1. Do a GET on http://en.wikipedia.org/w/api.php?format=xml&action=opensearch&search=[keyword] to retrieve a list of suggested pages for the keyword

  2. Loop through each suggested page using a GET on http://en.wikipedia.org/w/api.php?format=json&action=query&export&titles=[page title]

  3. Extract any paragraphs found on the page into an array

  4. Do something with the array

I'm stuck on #3. I can see a bunch of JSON data that includes "\n\n" between paragraphs, but for some reason the PHP explode() function doesn't work.

Essentially I just want to grab the "meat" of each Wikipedia page (not titles or any formatting, just the content) and break it by paragraph into an array.

Any ideas? Thanks!

+1  A: 

The \n\n are literally those characters, not linefeeds. Make sure you use single quotes around the string in explode:

$parts = explode('\n\n', $text);

If you choose to use double quotes you'll have to escape the \ characters like so:

$parts = explode("\\n\\n", $text);

On a side note: Why do you retrieve the data in two different formats? Why not go for only JSON or only XML?

Emil Vikström
Awesome - thanks! I was using double quotes instead of single quotes for the explode() function. I tried using XML for query #2, but the entire page content is inside one XML element so it doesn't help for extracting the paragraphs. At least with JSON there's '\n\n' between paragraphs.
Kane
So with XML there were two linefeeds instead of the literal \n\n ? In that case you should be able to switch over to XML and run explode("\n\n", $text) with double quotes ;-)
Emil Vikström
I can't tell if there are two linefeeds in XML, I just see a blank line between paragraphs. Unfortunately the "\n\n" didn't seem to work. Thanks anyways! I should be fine with the XML / JSON mix.
Kane
If you use a JSON parser, the \n\ns will be converted into real line endings and can use the proper "\n\n" separator in explode.
Bryan
A: 

Do you mind sharing your code. I am working on a similar project. Thanks.

Edd
Sure dude, send me an email - kanewebm[at]gmail[dot]com
Kane