ansaurus

Question

php regular expression to match specific url pattern

Answer 1

+1 A:

'/http:\/\/[^\/]+/[^.]+\.asp\?urlid=\d+/'

But better use HTML Parser, an example here with PHP Simple HTML DOM

$html = file_get_html('http://www.google.com/');

// Find all links
foreach($html->find('a') as $element)
       echo $element->href . '<br>';

S.Mark 2010-03-28 09:02:25

Answer 2

+3 A:

Here is how to do it properly with the native DOM extensions

// GET file
$doc = new DOMDocument;
$doc->loadHtmlFile('http://example.com/');

// Run XPath to fetch all href attributes from a elements
$xpath = new DOMXPath($doc);
$links = $xpath->query('//a/@href');

// collect href attribute values from all DomAttr in array
$urls = array();
foreach($links as $link) {
    $urls[] = $link->value;
}
print_r($urls);

Note that the above will also find relative links. If you don't want those adjust the Xpath to

'//a/@href[starts-with(., "http")]'

Note that using Regex to match HTML is the road to madness. Regex matches string patterns and knows nothing about HTML elements and attributes. DOM does, which is why you should prefer it over Regex for every situation that goes beyond matching a supertrivial string pattern from Markup.

Gordon 2010-03-28 09:20:07

ansaurus

tags:

views:

answers:

php regular expression to match specific url pattern

related questions