views:

182

answers:

3

Hello,

I have a weird issue working with the Javascript Regexp.exec function. When calling multiple time the function on new (I guess ...) regexp objects, it works one time every two. I don't get why at all!

Here is a little loop example but it does the same thing when used one time in a function and called multiple times.

for (var i = 0; i < 5; ++i) {
  console.log(i, (/(b)/g).exec('abc'));
}

> 0 ["b", "b"]
> 1 null
> 2 ["b", "b"]
> 3 null
> 4 ["b", "b"]

When removing the /g, it gets back to normal.

for (var i = 0; i < 5; ++i) {
  console.log(i, (/(b)/).exec('abc'));
}             /* no g ^ */

> 0 ["b", "b"]
> 1 ["b", "b"]
> 2 ["b", "b"]
> 3 ["b", "b"]
> 4 ["b", "b"]

I guess that there is an optimization, saving the regexp object, but it seems strange.

This behaviour is the same on Chrome 4 and Firefox 3.6, however it works as (I) expected in IE8. I believe that is intended but I can't find the logic in there, maybe you will be able to help me!

Thanks

+3  A: 

/g is not intended to work for simple matching:

/g enables "global" matching. When using the replace() method, specify this modifier to replace all matches, rather than only the first one.

I'd imagine internally javascript holds the matching after the capture, so it would be able to resume matching and therefore null is returned since b occur only once in the subject. Compare:

for (var i = 0; i < 5; ++i) {
  console.log(i +'    ' + (/(b+)/g).exec("abbcb"));
}

returns:

0 bb,bb
1 b,b
2 null
3 bb,bb
4 b,b
SilentGhost
why the downvote?
SilentGhost
Depending on what "simple matching" means, the "g" option does make perfect sense with "exec". In this case, for example, if the test string had multiple "b" characters in it, the result array would have all of them.
Pointy
@Pointy: simple as in "not replace". And as my code clearly indicates it doesn't work this way
SilentGhost
OK, well you're mistaken. The "g" flag is very useful even when not doing a "replace" operation. For example, I might want to pluck all the numbers out of a sentence: /(\d+)/g.exec(sentence).
Pointy
SilentGhost is correct about the browsers holding the regex. If you want it to work the same in all browsers, replace `(/(b)/g)` with `(new RegExp("(b)","g"))` This will create a brand new regex object instead of reusing the same one
Gordon Tucker
@Pointy: except that it doesn't work of course in Firefox and I assume some other browsers.
SilentGhost
Or do like I suggested and simply reset the "lastIndex" property on each iteration of the loop. That the browser keeps the regex as an intact object is not in question - what's incorrect in SilentGhost's comment is the claim that the "g" option is therefore not useful outside of the "replace" operation.
Pointy
@Pointy: I said not intended, not "not useful".
SilentGhost
Where do you get that it's "not indended"? It's used outside of "replace" all the time and it works exactly as documented. That IE doesn't do it properly is hardly a surprise.
Pointy
SilentGhost, you're just mistaken here. The "g" modifier on regex literals does exactly the same thing as the "g" in the second parameter to the RegExp constructor. The difference is that the constructor always instantiates a new object. I'll let you ponder which is more expensive: to construct a new RegExp instance (including the parse of the expression itself), or to reset the "lastIndex" property on an existing regular expression object.
Pointy
@Pointy: what are we arguing exactly? `/g` does not work. Why? because of the way javascript regex engine treats this flag, that is it holds, after successful match and continues afterwards. You could reset `lastIndex`, you could create new object, but **it's not going to match all occurrences of the pattern in the subject in one go** as my example clearly shows. I'm happy that it works for you though.
SilentGhost
Pointy is correct about the /g being the same as using "g" in the new RegExp method. Because of how browsers handle inline regex differently, it is best to avoid using it.
Gordon Tucker
Well, SilentGhost, it looks like you're right about exec() - for whatever reason it doesn't do anything interesting with the "g" flag. However, the String "match()" method does do the "right thing" with it. (I don't think I've coded up a call to "exec()" in years.) Try "Hello 123 456 789".match(/(\d+)/g)" in a FireBug console.
Pointy
The problem with the match function is that it returns an array of the matched string, and not what has been captured."_a _b _c".match(/_([a-c])/g) -> ["_a", "_b", "_c"] instead of ["a", "b", "c"].This is why i'm using exec instead of match. This is a really strange behavior not to be able to get captured values except from exec
Vjeux
+1  A: 

If you're going to reuse the same regular expression anyway, take it out of the loop and explicitly reset it:

var pattern = /(b)/g;
for (var i = 0; i < 5; ++i) {
  pattern.lastIndex = 0;
  console.log(i + ' ' + pattern.exec("abc"));
}
Pointy
This is probably the best way to do this, but only works if you are setting the regex to a var and not using it inline as the poster was (i.e. `(/(b)/g).lastIndex = 0` will *not* work)
Gordon Tucker
A: 

Thanks :)

I found an interesting side effet, it's possible make a static variable (in sense of C, global but only visible from the function) without closure!

   function test () {
     var static = /a/g;
     if ('count' in static) {
       static.count++;
     } else {
       static.count = 1;
     }
     console.log(static.count);
   }

   for (var i = 0; i < 5; ++i) { test(); }
   1
   2
   3
   4
   5

(I'm making a new answer because we can't put code inside a comment)

Vjeux
This is not a surprise. Bracketed sections of code in constructs like "for" loops are NOT new lexical scopes in Javascript. In other words, the effect of the placement of the "var" statement inside the loop is precisely the same as placing it at the head of the surrounding function block.
Pointy
This acts differently in IE, because IE of how IE handles regex differently. In Chrome/Safari/FF inline regex expressions are considered global even though they are defined locally as a variable. In IE, each is constructed as a new regex so you print 1 1 1 1 1 instead of 1 2 3 4 5
Gordon Tucker