views:

192

answers:

1

I'm trying to get XPath to return an attribute value yet first search for the tag's contents, i.e. if I have

<select name="xxx">
  <option=bla>123</option>
  <option=blubb>456</option>
</select>

I want to say, "search for the option tag containing 456 within a certain select tag with name "xxx".

I do this:

my $xp = XML::XPathEngine->new();
my $tree = HTML::TreeBuilder::XPath->new_from_content($mech->content);
my $search = $xp->find('//select[@name="xxx"]/option[.=~ /' . $re . '/]/@value', $tree);

($re is a regexp for the option part.)

All it returns on the last line ($xp->find) is

Can't locate object method "getRootNode" via package "HTML::TreeBuilder::XPath::Attribute" at /usr/lib/perl5/site_perl/5.8.8/XML/XPathEngine/NodeSet.pm line 90, <MYFILE> line 1.

What's wrong here? Is that module faulty? Is my XPath faulty? Should I use something else that "just works"?

PS. I don't use XML::Parser because it complains about a perfectly valid page

mismatched tag at line 9, column 3, byte 427 at /usr/lib/perl5/vendor_perl/5.8.8/i586-linux-thread-multi/XML/Parser.pm line 187


<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd"&gt;
<html>
<head>
    <title>aaaa</title>
<link href="/x/include/main.css" type="text/css" rel="stylesheet">
<link href="/x/images/favicon.ico" rel="shortcut icon">
<meta http-equiv="Content-Type" content="text/html;charset=utf-8">
<script type="text/javascript" src="/x/include/layout.js"></script>
</head>

<body style=
.....
+2  A: 

Well, your example HTML doesn't have any value attributes, and you don't give a complete example script. But the following script works fine for me:

use XML::XPathEngine;
use HTML::TreeBuilder::XPath;

my $content = <<'';
<select name="xxx">
  <option value=bla>123</option>
  <option value=blubb>456</option>
</select>

my $re = '456';

my $xp = XML::XPathEngine->new();
my $tree = HTML::TreeBuilder::XPath->new_from_content($content);
my $search = $xp->find('//select[@name="xxx"]/option[.=~ /' . $re . '/]/@value', $tree);

print "$search\n";

At a guess, you've put something in $re that doesn't work.

I'm using:

  • HTML::TreeBuilder 3.23
  • HTML::TreeBuilder::XPath 0.08
  • XML::XPathEngine 0.08

The reason XML::Parser doesn't work is that HTML is not XML.

cjm
Yep your example works. I don't know why mine doesn't. Ok the option tags are messed up in my example but my original contains the value attributes.I have now switched completely to REs for this part since that HTML that I needed to parse didn't validate anyway.Thanks for your help.
Marki