views:

313

answers:

4

Hi, I am looking for a regular expression in PHP which would match the anchor with a specific text on it. E.g I would like to get anchors with text mylink like:

<a href="blabla" ... >mylink</a>

So it should match all anchors but only if they contain specific text So it should match these strings:

<a href="blabla" ... >mylink</a>

<a href="blabla" ... >blabla mylink</a>

<a href="blabla" ... >mylink bla bla</a>

<a href="blabla" ... >bla bla mylink bla bla</a>

but not this one:

<a href="blabla" ... >bla bla bla bla</a>

Because this one does not contain word mylink.

Also this one should not match: "mylink is string" because it is not an anchor.

Anybody any Idea?

Thanx Granit

+1  A: 

This should work (build the regex string and insert whatever string you need instead of "mylink")

<\s*a\s+[^>]*>[^<>]*mylink[^<>]*<\s*\/a\s*>

But this is not recommended. You should use an HTML parser instead and process the tag. Regex is not really the right tool for this. (The above regex will not work if you have links that contain ">" although that might be rare)

I presume php doesnt require any special escape characters if you just use the appropriate wrap around.

Tested at regexpal.com

A few notes::
\s* - To match optional whitespace
\s+ - To match atleast one space/tab and any extra optional whitespace
[^>] - Matches any character except '>'
[^<>]- Matches any character except '<' or '>'

UPDATE: escaped the "/" for php matching with m/regex/

Jass
Note that an attribute value can contain a plain `>`.
Gumbo
Ofcourse, added that disclaimer. I could go ahead and add href="[^"]*"|'[^']' but next you would want all attributes to allow > then i would have to allow attribute names to start with only a character and not a number. Thats why I said use HTML parser. :D
Jass
I get warning:Warning: preg_match(): Unknown modifier 'a'
Granit
@Granit, you'll either need to escape the `/` in your regex, or use a different delimiter. But really, what's wrong with my suggestion?
Bart Kiers
@Granit you ll need to put the regex in m~regex~ instead if the usual m/regex/. or escape the /
Jass
@Granit: Use HTML parsers. Its better anyday. Use an existing sax based parser that captures the a tag and it should be done. simple and neat. +1 to bart
Jass
A: 
if (preg_match('%<\s*a\s+href="blabla"[^>]*>(.*mylink.*)<\s*/a>%', $text, $regs)) {
    $result = $regs[1];
} else {
    $result = "";
}

$regs[0] will hold the complete match $regs[1] will hold the bit inside the a tag

rikh
Note that an attribute value can contain a plain `>`.
Gumbo
A: 
/<a[^>]*>([^<]*mylink[^<]*)<\/a>/

it's a bit simplistic, as it will break if tags are inside the link (<a href="/xyz">xyz <i>mylink</i> aaa</a>), but it should work.

Piskvor
Note that an attribute value can contain a plain `>`.
Gumbo
+4  A: 

Try a parser instead:

require_once "simple_html_dom.php";

$data = 'Hi, I am looking for a regular expression in PHP which would match the anchor with a 
specific text on it. E.g I would like to get anchors with text mylink like: 
<a href="blabla" ... >mylink</a>

So it should match all anchors but only if they contain specific text So it should match t
hese string:

<a href="blabla" ... >mylink</a>

<a href="blabla" ... >blabla mylink</a>

<a href="blabla" ... >mylink bla bla</a>

<a href="blabla" ... >bla bla mylink bla bla</a>

but not this one:

<a href="blabla" ... >bla bla bla bla</a> Because this one does not contain word mylink.

Also this one should not match: "mylink is string" because it is not an anchor.

Anybody any Idea? Thanx Granit';

$html = str_get_html($data);

foreach($html->find('a') as $element) {
  if(strpos($element->innertext, 'mylink') === false) {
    echo 'Ignored: ' . $element->innertext . "\n";
  } else {
    echo 'Matched: ' . $element->innertext . "\n";
  }
}

which produces the output:

Matched: mylink
Matched: mylink
Matched: blabla mylink
Matched: mylink bla bla
Matched: bla bla mylink bla bla
Ignored: bla bla bla bla

Download simple_html_dom.php from: http://simplehtmldom.sourceforge.net/

Bart Kiers