ansaurus

Question

How to match second <a> tag in this string.

Answer 1

+2 A:

like

   /<a.+?>[^<>]*Announcements[^<>]*</a>/

PS. regular expression are the wrong tool for parsing html

stereofrog 2009-11-06 17:23:21

+1 for "regular expression are the wrong tool for parsing html" - absolutely.

TrueWill 2009-11-06 18:23:04

Answer 2

+4 A:

Parse the fragment into a DOM. Use XPath to issue:

(//a)[2]

Done.

Tomalak 2009-11-06 17:23:55

+1 for using a DOM instead of a RegEx.

TrueWill 2009-11-06 18:22:24

I think this is closest I'm looking for, rather then using a regex.

Jason Evans 2009-11-09 08:59:42

Answer 3

+1 A:

/(<a.*?<\/a>).*?(<a.*?<\/a>)/

$1 matches the first tag, $2 matches the second

Rob 2009-11-06 17:32:06

Answer 4

A:

you don't have to use complicated regular expression for this if you don't want to. since you want to get anchors, and usually anchors has ending tags </a>, you can use your favourite language and do splits on </a> for each line. eg pseudocode

for each line in htmlfile
do
   var=split line on </a>
   for each item in var
   do
        if item has "Announcement" then
           print "found"
        end if
   done
done

ghostdog74 2009-11-07 01:31:42

Answer 5

A:

<?php
$string = '<span id="ctl00_PlaceHolderTitleBreadcrumb_ContentMap"><span><a class="ms-sitemapdirectional" href="/">My Site</a></span><span> &gt; </span><span><a class="ms-sitemapdirectional" href="/Lists/Announcements/AllItems.aspx">Announcements</a></span><span> &gt; </span><span class="ms-sitemapdirectional">Settings</span></span>';

$dom = new DOMDocument();
$dom->loadHTML($string);
$anchors = $dom->getElementsByTagName('a');
if ( $anchors->length ) {
    $secondAnchor = $anchors->item(1);
    echo innerHTML($secondAnchor->parentNode);
}

function innerHTML($node){
    $doc = new DOMDocument();
    foreach ($node->childNodes as $child)
    $doc->appendChild($doc->importNode($child, true));

    return $doc->saveHTML();
}

meder 2009-11-07 01:38:00

Answer 6

A:

If you know the exact text of the element, and you know it's the last element of its kind in the fragment, you have more than enough information to match it with a regex. I suspect you're using a regex like this:

/<a\s+.*>Announcements<\/a>/s

...and the .* is matching everything between the <a of the first anchor tag and the >Announcements</a> of the second one. Switching to a non-greedy quantifier:

/<a\s+.*?>Announcements<\/a>/s

...doesn't help; a reluctant quantifier stops matching as soon as possible, but the problem here is that it starts matching too soon. You need to replace the .* with something more specific, something that can only match whatever comes between the opening <a and closing > of a single tag:

/<a\s+[^<>]+>Announcements<\/a>/

Now, when it reaches the end of the first <a> tag and doesn't see Announcements</a> it will abort that match attempt, move along and start fresh at the second <a> tag.

Alan Moore 2009-11-07 09:18:22

ansaurus

tags:

views:

answers:

How to match second <a> tag in this string.

related questions