I am trying to use MediaWiki's API to get articles in XML format and include them on my page. I created a simple code which basically gets the XML representation of an article using ?action=parse&page=Page_Name&format=xml
requests. The code is following:
if($_GET["page"]=='') die("Page not specified (possibly direct call)");
$pagename = $_GET["page"];
$handle = @fopen("mediawiki/api.php?action=parse&page=".$pagename."&format=xml", "r");
if ($handle) {
while (!feof($handle)) {
$buffer = $buffer.fgets($handle);
}
$buffer = html_entity_decode($buffer);
/*
echo $buffer;
*/
$xml = simplexml_load_string($buffer);
foreach($xml->parse->children() as $child){
switch($child->getName()){
case "text":
echo $child->asXML()."<br/>";
break;
case "categories":
echo "<h3>Categories this project is related to: </h3><br/>";
foreach($child->children() as $grandChild){
echo $grandChild." | ";
}
break;
}
}
fclose($handle);
}
Now the problem is that I'm getting very strange output. Any <a name="" href=""></a>
becomes converted to <a name="" href=""/>
which makes all following text to become a link (I guess since there is not closing tag </a>
). This is observed both in Mozilla Firefox and Google Chrome. I'm suspecting $buffer = html_entity_decode($buffer);
to cause this problem. Is there a parameter for html_entity_decode();
I should specify to avoid this? Is it caused by some other error or misuse of html_entity_decode();
in my code?
(To see the XML output of the Wiki's API, you can try http://en.wikipedia.org/w/api.php?action=parse&page=No_Such_Page&format=xml
with different page
parameters)
POSSIBLE SOLUTION: I didn't want to go to JSON, as Jordan suggested, so I came up with this solution. I simply moved html_entity_decode
to the case "text":
block. So now I have there echo html_entity_decode($child->asXML())."<br/>";
. Do you think this is feasible enough?