tags:

views:

119

answers:

3

Hello all,

I am trying to match a html element but I don't think its matching since $titles is empty - can anyone correct me?

My preg_match:

   preg_match_all("~<td align=\"left\" width=\"50%\">[^<]*. <b><a href=\"(.*?)\">[^<]*</a>~i", $main, $titles);

Example HTML to match:

//<td align="left" width="50%">1. <b><a title="Wat" href="http://www.exmple.com/q.html"&gt;Wat&lt;/a&gt;&lt;/b&gt;&lt;br&gt;&lt;/td&gt;

Am I missing something?

Thanks all for any help

+1  A: 

There's nothing to match title="Wat" in the <a> tag.

I'd suggest not using a regex to parse it though. I'm not too familiar with PHP but I'm sure it already has something that will do most of the work for you.

Corey
If the document you're searching is valid XHTML you can use the built in simpleXML parser, but many times it is not.
Austin Fitzpatrick
@Corey - aaah! Thanks, I did not even notice that. Dear God! Putting `title=\"[^<]*\"` - seems to get it work.
Abs
+1  A: 

As i said in my comment regex is rarely if ever the proper tool to use when trying to parse html. Im foing to use an example of Zend_Dom_Query, one of th ecomponetns in Zend Framework simply because i havent seen it recommended on one of these questions yet. So...

$dom = new Zend_Dom_Query($htmlHaystack);
$anchors = $dom->query('//td/a[@title]'); // xpath here
if(count($anchors) > 0)
{
  $titles = array();
  foreach($anchors as $element)
  {
     $titles[] = $element->getAttribute('title');
  }
}
else
{
  $title = null;
}
prodigitalson
A: 
$string='<td align="left" width="50%">1. <b><a title="Wat" href="http://www.exmple.com/q.html"&gt;Wat&lt;/a&gt;&lt;/b&gt;&lt;br&gt;&lt;/td&gt;';
$s = explode("</a>",$string);
foreach($s as $k){
   if (strpos($k,"href")!==FALSE){
        echo preg_replace('/.*href="|">.*/ms',"",$k);
   }
}
ghostdog74