views:

371

answers:

2

Hello,

I'm trying to get a link text using regex. there are possibly several links that may match the pattern and I want to get the furthest one until the 4th. Here is my JS code:

var level=1;
while ( _match = /<a href="http:\/\/www.mysite.com\/x\/(?:.*)>(.*)<\/a>/img.exec(_html)){
    if (level < 5)  (_anchor_text=_match[1]);
    level ++;
}

The problem is that this code enters infinite loop on IE (works well on FF), although the pattern exists. Any help is appreciated.

A: 

replace the _html after match, and not use global flag (g)

var level=1;
var rg;
while ( _match = (rg = /<a href="http:\/\/www.mysite.com\/x\/(?:.*)>(.*)<\/a>/im).exec(_html)){
    if (level < 5)  (_anchor_text=_match[1]);
    level ++;
    _html = html.replace(rg, "");
}
Tolgahan Albayrak
+2  A: 

RegExp.exec, I believe, makes use of the lastIndex property and continually modifies it to make things like "global group capturing" possible; for it to work you need to have a single regular expression. Currently you're creating a new one on every iteration so it won't work...

Try this:

var level = 1;
var pattern = /<a href="http:\/\/www.mysite.com\/x\/(?:.*)>(.*)<\/a>/img;
var _match;
while ( _match = pattern.exec(_html)){
     if (level < 5)  (_anchor_text=_match[1]);
     level ++;
}
J-P
It actually works on Firefox, Chrome, Opera and Safari, if you use a regexp literal within the while statement. IE seems to be the one behaving differently. This is not to say that what IE is doing is wrong...
Ates Goral
@Ates, I think that behaviour is due to the fact that literal regular expressions are "cached" internally.. so when you re-use one, you're just referencing the same regex object.
J-P