views:

529

answers:

5

Hi,

I have a bunch of strings, each containing an anchor tag and url.

string ex.

here is a link <a href="http://www.google.com"&gt;http://www.google.com&lt;/a&gt;. enjoy!

i want to parse out the anchor tags and everything in between.

result ex.

here is a link. enjoy!

the urls in the href= portion don't always match the link text however (sometimes there are shortened urls,sometimes just descriptive text).

i'm having an extremely difficult time figuring out how to do this with either regular expressions or php functions. how can i parse an entire anchor tag/link from a string?

thanks!

+1  A: 

You shouldn't use regex to parse html and use an html parser instead.

But if you should use regex, and your anchor tags inner contents are guaranteed to be free of html like </a>, and each string is guaranteed to contain only one anchor tag as in the example case, then - only then - you can use something like:

Replacing /^(.+)<a.+<\/a>(.+)$/ with $1$2

Amarghosh
much thanks for the link.
minimalpop
A: 

Since your problem seems to be very specific, I think this should do it:

$str = preg_replace('#\s?<a.*/a>#', '', $str);
kemp
A: 

just use your normal PHP string functions.

$str='here is a link <a href="http://www.google.com"&gt;http://www.google.com&lt;/a&gt;. enjoy!';
$s = explode("</a>",$str);
foreach($s as $a=>$b){
    if( strpos( $b ,"href")!==FALSE ){
        $m=strpos("$b","<a");
        echo substr($b,0,$m);
    }
}   
print end($s);

output

$ php test.php
here is a link . enjoy!
ghostdog74
+1  A: 

Looking at your result example, it seems like you're just removing the tags/content - did you want to keep what you stripped out or no? If not you might be looking for strip_tags().

pssdbt
A: 
$string = 'here is a link <a href="http://www.google.com"&gt;http://www.google.com&lt;/a&gt;. enjoy!';
$text = strip_tags($string);
echo $text; //Outputs "here is a link . enjoy!"
pnm123