Folks,
There is so much info out there on HTML::Treebuilder that I'm surprised I can't find the answer, hopefully I'm not just missing it.
What I'm trying to do is simply parse between parent nodes, so given a html doc like this
<html>
<body>
<a id="111" name="111"></a>
<p>something</p>
<p>something</p>
<p>something</p>
<a href=xxx">something</a>
<a id="222" name="222"></a>
<p>something</p>
<p>something</p>
<p>something</p>
....
</body>
</html>
I want to be able to get the info about that 1st anchor tag (111), then process the 3 p tags and then get the next anchor tag (222) and then process those p tags etc etc.
Its easy to get to each anchor tag
use HTML::TreeBuilder;
my $tree = HTML::TreeBuilder->new();
$tree->parse_file("index-01.htm");
foreach my $atag ( $tree->look_down( '_tag', 'a' ) ) {
if ($atag->attr('id')) {
# Found 'a' tag, now process the p tags until the next 'a'
}
}
But once I find that tag how do I then get all the p tags until the next anchor?
TIA!!