tags:

views:

32

answers:

1

I'm trying to create a database of World of Warcraft gems. If I go to this page:

http://www.wowarmory.com/search.xml?fl[source]=all&fl[type]=gems&fl[subTp]=purple&searchType=items

And go to View Source in Firefox, I see a tonne of XML data which is exactly what I want. I wrote up this quick script to try and parse some of it:

<?php

$gemUrls = array(
                 'Blue' => 'http://www.wowarmory.com/search.xml?fl[source]=all&amp;fl[type]=gems&amp;fl[subTp]=blue&amp;searchType=items',
                 'Red' => 'http://www.wowarmory.com/search.xml?fl[source]=all&amp;fl[type]=gems&amp;fl[subTp]=red&amp;searchType=items',
                 'Yellow' => 'http://www.wowarmory.com/search.xml?fl[source]=all&amp;fl[type]=gems&amp;fl[subTp]=yellow&amp;searchType=items',
                 'Meta' => 'http://www.wowarmory.com/search.xml?fl[source]=all&amp;fl[type]=gems&amp;fl[subTp]=meta&amp;searchType=items',
                 'Green' => 'http://www.wowarmory.com/search.xml?fl[source]=all&amp;fl[type]=gems&amp;fl[subTp]=green&amp;searchType=items',
                 'Orange' => 'http://www.wowarmory.com/search.xml?fl[source]=all&amp;fl[type]=gems&amp;fl[subTp]=orange&amp;searchType=items',
                 'Purple' => 'http://www.wowarmory.com/search.xml?fl[source]=all&amp;fl[type]=gems&amp;fl[subTp]=purple&amp;searchType=items',
                 'Prismatic' => 'http://www.wowarmory.com/search.xml?fl[source]=all&amp;fl[type]=gems&amp;fl[subTp]=purple&amp;searchType=items'
                 );


// Get blue gems

$blueGems = file_get_contents($gemUrls['Blue']);

$xml = new SimpleXMLElement($blueGems);

echo $xml->items[0]->item;

?>

But I get a load of errors like this:

Warning: SimpleXMLElement::__construct() [simplexmlelement.--construct]: Entity: line 20: parser error : xmlParseEntityRef: no name in C:\xampp\htdocs\WoW\index.php on line 19

Warning: SimpleXMLElement::__construct() [simplexmlelement.--construct]: if(Browser.iphone && Number(getcookie2("mobIntPageVisits")) < 3 && getcookie2( in C:\xampp\htdocs\WoW\index.php on line 19

I'm not sure what's wrong. I think file_get_contents() is bringing back data that isn't XML, maybe some Javascript files judging by the iPhone parts in the errors.

Is there any way to just get back the XML from that page? Without any HTML or anything?

Thanks :)

A: 

What is returned is an xhtml, it's xml-ish, but not good enough for an XML parser. To use SimpleXMLElement you would need well-formed XML. From the documentation of the constructor:

Method signature:

__construct ( string $data [, int $options [, bool $data_is_url 
             [, string $ns [, bool $is_prefix ]]]] )

$data is described as:

A well-formed XML string or the path or URL to an XML document if data_is_url is TRUE.

So, random webpage will not satisfy this parser. You ask:

"Is there any way to just get back the XML from that page? Without any HTML or anything?"

You can contact the webmasters and find out if they have an XML view of the data. Failing that, you could use a plain HTML parser to try and extract data. I like PHP Simple HTML DOM Parser. Check out How to implement a web scraper in PHP?

artlung