tags:

views:

230

answers:

5

Hi, I need to retrieve anchor tag innerHTML using RegExp in php. Consider I have a syntax like

   <div class="detailsGray"><span class="detailEmail"><a href="http://example.com"class="fontLink"&gt;[email protected]&lt;/a&gt;&lt;/span&gt;&lt;/div&gt;

Try to get it by

preg_match_all('/class=\"fontLink"\>.*\<\/a\>/', $raw, $matches);

but which is not working. Only I need to retrieve [email protected] using RegExp and preg_match_all(). Thanks

A: 

Looking at the Regex is a bit of a mess:

'/class=\"fontLink\">.*?<\/a>/'

As far as I know there is nothing special about <> in regex.

You don't want .* as that will go to straight to end of the line and then start working backwards. .*? will take the next character if doesn't match until </a>.

kuroutadori
A: 

What is your input ? If it's raw data from the web, regexp is not a reliable way to do that. It would be better to load your dom as a tree.

Guillaume Lebourgeois
A: 

You need positive lookahead and lookbehind, so your pattern will be like this:

(?<=class=\"fontLink\"\>).*(?=\<\/a\>)
KoKo
+2  A: 

Use a parser. Luckily, PHP has one!

$html = '<div class="detailsGray"><span class="detailEmail"><a href="http://example.com" class="fontLink">[email protected]</a></span></div>';
echo retrieve_node_text($html, "//a[@class='fontLink']");

// -----------------------------------------------
function retrieve_node_text($html_fragment, $xpath) {
  $fragment = new DOMDocument();
  $fragment->loadHTML($html_fragment);

  if ($fragment) {
    $xp = new DOMXPath($fragment);
    $result = $xp->query($xpath);

    if ($result->length == 1) {
      return $result->item(0)->textContent;
    }
  }
  return FALSE;
}

returns:

[email protected]
Tomalak
Thank you so much
Ajith
@Ajith: Don't forget to put some error checking and handling into the code. It's not guaranteed that `loadHTML()` or `query()` run successfully, since both `$html_fragment` and `$xpath` could be broken. Be sure to test with broken input and to handle PHP errors or warnings accordingly.
Tomalak
A: 

I think your approach was good enough. This is my solution:

preg_match('/class=\"fontLink"\>(.*)\<\/a\>/', $raw, $matches);
$parsedEmail = $matches[1];

Just add parenthesis on the parts that you want, so they can be matched alone. If you only want to match one issue use preg_match() instead of preg_match_all().

LatinSuD