Hello i want to extract links
<a href="/portal/clients/show/entityId/2121" >
and i want a regex which givs me /portal/clients/show/entityId/2121
the number at last 2121 is in other links different
any idea?
views:
530answers:
5
A:
Paring links from HTML can be done using am HTML parser.
When you have all links, simple get the index of the last forward slash, and you have your number. No regex needed.
Bart Kiers
2009-10-05 12:10:53
hmm.. $html->find('href') or what?
streetparade
2009-10-05 12:11:52
I don't know. What does this find(...) come from?
Bart Kiers
2009-10-05 12:42:36
+4
A:
Simple PHP HTML Dom Parser example:
// Create DOM from string
$html = str_get_html($links);
//or
$html = file_get_html('www.example.com');
foreach($html->find('a') as $link) {
echo $link->href . '<br />';
}
karim79
2009-10-05 12:19:33
this would give that as result <a href="/portal/clients/show/entityId/4636" ><img src="/img/bullet_go.png" alt="" title="Kundenakte aufrufen" /></a>"
streetparade
2009-10-05 12:26:21
but i just would extract /portal/clients/show/entityId/4636 so this worked '/<a\s+(?:[^"'>]+|"[^"]*"|'[^']*')*href=("[^"]+"|'[^']+'|[^<>\s]+)/i'
streetparade
2009-10-05 12:26:57
A:
Regex for parsing links is something like this:
'/<a\s+(?:[^"'>]+|"[^"]*"|'[^']*')*href=("[^"]+"|'[^']+'|[^<>\s]+)/i'
Given how horrible that is, I would recommend using Simple HTML Dom for getting the links at least. You could then check links using some very basic regex on the link href.
Yacoby
2009-10-05 12:20:40
this worked for me $patterndocumentLinks ='/<a\s+(?:[^"\'>]+|"[^"]*"|\'[^\']*\')*href=("[^"]+"|\'[^\']+\'|[^<>\s]+)/i'; thank you
streetparade
2009-10-05 12:25:43
+1
A:
When "parsing" html I mostly rely on PHPQuery: http://code.google.com/p/phpquery/ rather then regex.
Max
2009-10-05 12:24:58
+2
A:
Don't use regular expressions for proccessing xml/html. This can be done very easily using the builtin dom parser:
$doc = new DOMDocument();
$doc->loadHTML($htmlAsString);
$xpath = new DOMXPath($doc);
$nodeList = $xpath->query('//a/@href');
for ($i = 0; $i < $nodeList->length; $i++) {
# Xpath query for attributes gives a NodeList containing DOMAttr objects.
# http://php.net/manual/en/class.domattr.php
echo $nodeList->item($i)->value . "<br/>\n";
}
soulmerge
2009-10-05 12:28:57