I have a bunch of XML files that are about 1-2 megabytes in size. Actually, more than a bunch, there are millions. They're all well-formed and many are even validated against their schema (confirmed with libxml2).
All were created by the same app, so they're in a consistent format (though this could theoretically change in the future).
I want to check the values of one element in each file from within a Perl script. Speed is important (I'd like to take less than a second per file) and as noted I already know the files are well-formed.
I am sorely tempted to simply 'open' the files in Perl and scan through until I see the element I am looking for, grab the value (which is near the start of the file), and close the file.
On the other hand, I could use an XML parser (which might protect me from future changes to the XML formatting) but I suspect it will be slower than I'd like.
Can anyone recommend an appropriate approach and/or parser?
Thanks in advance.
Update
Here's the structure/complexity of the data I am trying to pull out:
<doc>
...
<someparentnode attrib="notme" attrib2="5">
<node>Not this one</node>
</someparentnode>
<someparentnode attrib="pickme" attrib2="5">
<node>This is the data I want</node>
</someparentnode>
<someparentnode attrib="notme"
attrib2="reallyreallylonglineslikethisonearewrapped">
<node>Not this one either and it may be
wrapped too.</node>
</someparentnode>
...
</doc>
The hierarchy goes a several levels deeper than that, but I think that covers off the sorts of things I am trying to do.