tags:

views:

185

answers:

2

I want to extract the url of all links in a string with certain anchor text.

I saw a previously post on doing this in javascript - can anyone help me do this in PHP?

http://stackoverflow.com/questions/369147/javascript-regex-to-extract-anchor-text-and-url-from-anchor-tags

+2  A: 

If you're parsing HTML to extract href attribute values from anchor tags, use an HTML/DOM Parser (definitely don't use regex).

PHP Simple HTML DOM Parser

PHP XML DOM

webbiedave
A: 
preg_match_all('#<a\s+href\s*=\s*"([^"]+)"[^>]*>([^<]+)</a>#i', $subject, $matches, PREG_SET_ORDER);

foreach ($matches as $match) {
    echo $match[0]; // <a ... href="url" ...>text</a>
    echo $match[1]; // url
    echo $match[2]; // text
}

This is how I'd do it with regex. There may be more efficient ways but this should be the simplest one.

EDIT: Noticed that you wanted to match all URLs, therefore changed to preg_match_all

preg_match_all

Revolt
Actually, I'm looking for the instance of one specific keyword. 'cornerstone' - maybe the simplest way to do this would be to sort through all the urls, and then try to find the ones that contain the cornerstone as part of the anchor text?
Bob Cavezza
in that case the pattern becomes '#<a\s+href\s*=\s*"([^"]+)"[^>]*>([^<]*cornerstone[^<]*)</a>#i'
Revolt