Document order is defined as
There is an ordering, document order, defined on all the nodes in the document corresponding to the order in which the first character of the XML representation of each node occurs in the XML representation of the document after expansion of general entities. Thus, the root node will be the first node. Element nodes occur before their children. Thus, document order orders element nodes in order of the occurrence of their start-tag in the XML (after expansion of entities).
In other words, the order in which things occur in the XML document. The XML::XPath module produces results in document order. For example:
#! /usr/bin/perl
use warnings;
use strict;
use XML::XPath;
my $entity_template = "/Entities"
. "/Entity"
. "[EntityName='!!NAME!!']"
;
my $tables_path = join "|" =>
qw( ./Tables/DataTables/DataTable
./Tables/OtherTables/OtherTable );
my $xp = XML::XPath->new(ioref => *DATA);
foreach my $ename (qw/ foo bar /) {
print "$ename:\n";
(my $path = $entity_template) =~ s/!!NAME!!/$ename/g;
foreach my $n ($xp->findnodes($path)) {
foreach my $t ($xp->findnodes($tables_path, $n)) {
print $t->toString, "\n";
}
}
}
__DATA__
The first expression searches for <Entity>
elements where each has an <ElementName>
child whose string-value is the Entity name selected. From there, we look for <DataTable>
or <OtherTable>
.
Given input of
<Entities>
<Entity>
<EntityName>foo</EntityName>
<EntityType>type1</EntityType>
<Tables>
<DataTables>
<DataTable>1</DataTable>
<DataTable>2</DataTable>
</DataTables>
<OtherTables>
<OtherTable>3</OtherTable>
<OtherTable>4</OtherTable>
</OtherTables>
</Tables>
</Entity>
<Entity>
<EntityName>bar</EntityName>
<EntityType>type2</EntityType>
<Tables>
<DataTables>
<DataTable>5</DataTable>
<DataTable>6</DataTable>
</DataTables>
<OtherTables>
<OtherTable>7</OtherTable>
<OtherTable>8</OtherTable>
</OtherTables>
</Tables>
</Entity>
</Entities>
the output is
foo:
<DataTable>1</DataTable>
<DataTable>2</DataTable>
<OtherTable>3</OtherTable>
<OtherTable>4</OtherTable>
bar:
<DataTable>5</DataTable>
<DataTable>6</DataTable>
<OtherTable>7</OtherTable>
<OtherTable>8</OtherTable>
To extract the string-values (the “inner text”), change $tables_path
to
my $tables_path = ". / Tables / DataTables / DataTable / text() |
. / Tables / OtherTables / OtherTable / text()";
Yes, that's repetitive—because XML::XPath implements XPath 1.0.
Output:
foo:
1
2
3
4
bar:
5
6
7
8