tags:

views:

895

answers:

5

I like the XMLReader class for it's simplicity and speed. But I like the xml_parse associated functions as it better allows for error recovery. It would be nice if the XMLReader class would throw exceptions for things like invalid entity refs instead of just issuinng a warning.

+3  A: 

SimpleXML seems to do a good job for me.

ctcherry
Does simpleXML fair well on say 1.5 gig files?
I don't know if XML fairs well on 1.5 gig files. Use a database?
nickf
Clients deliver us files that big that we have to parse through.
Personally never dealt with files that large, however SimpleXML seems to load the entire file into memory, which for your case might be quite a draw back. It seems for a file of that size XML is probably not the optimal storage format. What does this file contain?
ctcherry
A: 

I mostly stick to SimpleXML, at least whenever PHP5 is available for me.

http://www.php.net/simplexml

UltimateBrent
+2  A: 

I'd avoid SimpleXML if you can. Though it looks very tempting by getting to avoid a lot of "ugly" code, it's just what the name suggests: simple. For example, it can't handle this:

<p>
    Here is <strong>a very simple</strong> XML document.
</p>

Bite the bullet and go to the DOM Functions. The power of it far outweighs the little bit extra complexity. If you're familiar at all with DOM manipulation in Javascript, you'll feel right at home with this library.

nickf
That's not valid XML.
Mark Biek
Unfortunately it is often easier to fix other people's mistakes than having them correct the errors.
I'm not sure what you mean by that. Was that the rationale behind accepting an incorrect answer?
Mark Biek
Seriously, this is not a valid reason to not use SimpleXML. SimpleXML is the solution if you have php5.... and valid xml.
dawnerd
I have to deal with feeds from clients that don't really know what XML is, don't understand character encoding, often outsource the generation of the feed to someone that f's it up. I could say "your stuff's borken" but then I loose their data but not accepting it.
*by not accepting it
@mark: "That's not valid XML". Really? I won't argue the point, but I'd be interested to know why not. Also, it doesn't change the validity of my answer. the SimpleXML functions are underpowered. Try validating against a DTD, or using XIncludes... You'll be much better off with DOM from the start.
nickf
@nickf Because you have a tag, then CDATA, then another tag without closing the first tag. Your example won't even validate as XHTML.
Mark Biek
@mike I'm sorry but I have no idea what you're talking about.
Mark Biek
@Mark - Are you sure about that? Paste my example into a validator and it works. (http://www.validome.org/xml/validate/). The DTD for that element would look like this: <!ELEMENT p (#PCDATA | strong)*>
nickf
Um that is valid XML...
Jordie
Wow, really can't believe people voted this down and think its invalid XML.
cletus
yeah.. it really made me second-guess myself for a while there.
nickf
I know this is a really old question, but I'd be interested to know how SimpleXML fails at parsing this?
Renesis
@Renesis - The value returned from `$xml->p` is `"Here is XML document"`. The `children()` function just returns the `<strong>` element, not the text nodes, and there's no way to actually build the above document without switching to the DOM functions.
nickf
+1  A: 

There are at least four options when using PHP5 to parse XML files. The best option depends on the complexity and size of the XML file.

There’s a very good 3-part article series titled ‘XML for PHP developers’ at IBM developerWorks.

“Parsing with the DOM, now fully compliant with the W3C standard, is a familiar option, and is your choice for complex but relatively small documents. SimpleXML is the way to go for basic and not-too-large XML documents, and XMLReader, easier and faster than SAX, is the stream parser of choice for large documents.”

Chaoley
+1  A: 

SimpleXML and DOM work seamlessly together, so you can use the same XML interacting with it as SimpleXML or DOM.

For example:

$simplexml = simplexml_load_string("<xml></xml>");
$simplexml->simple = "it is simple.";

$domxml = dom_import_simplexml($simplexml);
$node = $domxml->ownerDocument->createElement("dom", "yes, with DOM too.");
$domxml->ownerDocument->firstChild->appendChild($node);

echo (string)$simplexml->dom;

You will get the result:

"yes, with DOM too."

Because when you import the object (either into simplexml or dom) it uses the same underlining PHP object by reference.

I figured this out when I was trying to correct some of the errors in SimpleXML by extending/wrapping the object.

See http://code.google.com/p/blibrary/source/browse/trunk/classes/bXml.class.inc for examples.

This is really good for small chunks of XML (-2MB), as DOM/SimpleXML pull the full document into memory with some additional overhead (think x2 or x3). For large XML chunks (+2MB) you'll want to use XMLReader/XMLWriter to parse SAX style, with low memory overhead. I've used 14MB+ documents successfully with XMLReader/XMLWriter.