Never ever use Regex to handle markup languages.
The original version of this answer (see below) used XML::XPath
. Grant McLean said in the comments:
XML::XPath
is an old and unmaintained module. XML::LibXML
is a modern, maintained module with an almost identical API and it's faster too.
so I made a new version that uses XML::LibXML
(thanks, Grant):
use warnings;
use strict;
use XML::LibXML;
my $doc = XML::LibXML->load_xml(location => 'articles.xml');
my $xp = XML::LibXML::XPathContext->new($doc->documentElement);
my $xpath = '/articles/article[position() < 4]';
foreach my $article ( $xp->findnodes($xpath) ) {
# now do something with $article
print $article.": ".$article->getName."\n";
}
For me this prints:
XML::LibXML::Element=SCALAR(0x346ef90): article
XML::LibXML::Element=SCALAR(0x346ef30): article
XML::LibXML::Element=SCALAR(0x346efa8): article
Links to the relevant documentation:
Original version of the answer, based on the XML::XPath
package:
use warnings;
use strict;
use XML::XPath;
my $xp = XML::XPath->new(filename => 'articles.xml');
my $xpath = '/articles/article[position() < 4]';
foreach my $article ( $xp->findnodes($xpath)->get_nodelist ) {
# now do something with $article
print $article.": ".$article->getName ."\n";
}
which prints this for me:
XML::XPath::Node::Element=REF(0x38067b8): article
XML::XPath::Node::Element=REF(0x38097e8): article
XML::XPath::Node::Element=REF(0x3809ae8): article
Have a look at the docs to find out what you can do with them.