views:

530

answers:

5

Hello i want to extract links <a href="/portal/clients/show/entityId/2121" > and i want a regex which givs me /portal/clients/show/entityId/2121 the number at last 2121 is in other links different any idea?

A: 

Paring links from HTML can be done using am HTML parser.

When you have all links, simple get the index of the last forward slash, and you have your number. No regex needed.

Bart Kiers
hmm.. $html->find('href') or what?
streetparade
I don't know. What does this find(...) come from?
Bart Kiers
+4  A: 

Simple PHP HTML Dom Parser example:

// Create DOM from string
$html = str_get_html($links);

//or
$html = file_get_html('www.example.com');

foreach($html->find('a') as $link) {
    echo $link->href . '<br />';
}
karim79
this would give that as result <a href="/portal/clients/show/entityId/4636" ><img src="/img/bullet_go.png" alt="" title="Kundenakte aufrufen" /></a>"
streetparade
but i just would extract /portal/clients/show/entityId/4636 so this worked '/<a\s+(?:[^"'>]+|"[^"]*"|'[^']*')*href=("[^"]+"|'[^']+'|[^<>\s]+)/i'
streetparade
@streetparade my bad, forgot to say $link->href, edited
karim79
A: 

Regex for parsing links is something like this:

'/<a\s+(?:[^"'>]+|"[^"]*"|'[^']*')*href=("[^"]+"|'[^']+'|[^<>\s]+)/i'

Given how horrible that is, I would recommend using Simple HTML Dom for getting the links at least. You could then check links using some very basic regex on the link href.

Yacoby
this worked for me $patterndocumentLinks ='/<a\s+(?:[^"\'>]+|"[^"]*"|\'[^\']*\')*href=("[^"]+"|\'[^\']+\'|[^<>\s]+)/i'; thank you
streetparade
+1  A: 

When "parsing" html I mostly rely on PHPQuery: http://code.google.com/p/phpquery/ rather then regex.

Max
+2  A: 

Don't use regular expressions for proccessing xml/html. This can be done very easily using the builtin dom parser:

$doc = new DOMDocument();
$doc->loadHTML($htmlAsString);
$xpath = new DOMXPath($doc);
$nodeList = $xpath->query('//a/@href');
for ($i = 0; $i < $nodeList->length; $i++) {
    # Xpath query for attributes gives a NodeList containing DOMAttr objects.
    # http://php.net/manual/en/class.domattr.php
    echo $nodeList->item($i)->value . "<br/>\n";
}
soulmerge