views:

132

answers:

4

I was experiencing some weird behaviour in some of my javascript code, but only in Firefox and Chrome. IE is fine.

I have isolated the problem and created a little page so you can see the behaviour yourself.

Essentially, it appears as if the Regular Expression object in MethodC is being reused across method calls to MethodC, even though it's a local variable. Can someone explain this behaviour?

<html>
<head>
<script type="text/javascript">
    function RunDemo()
    {
        var subject = "01234 555 6789";

        for (var i = 1; i <= 10; i++) {
            MethodA(subject, i);
            MethodB(subject, i);
            MethodC(subject, i);
        }
    }

    // OK, OK, OK, OK, OK, OK, OK, OK, OK, OK
    function MethodA(subject, iteration)
    {
        var myRegexp = new RegExp("5", "g");
        var matches = myRegexp.exec(subject);
        AddItem(matches ? "OK" : "no match", "listA");
    }

    // OK, OK, OK, OK, OK, OK, OK, OK, OK, OK
    function MethodB(subject, iteration)
    {
        var myRegexp = /5/;
        var matches = myRegexp.exec(subject);
        AddItem(matches ? "OK" : "no match", "listB");
    }

    // OK, OK, OK, no match, OK, OK, OK, no match, OK, OK (in FireFox and Chrome, IE is fine)
    function MethodC(subject, iteration) {
        var myRegexp = /5/g;
        var matches = myRegexp.exec(subject);
        AddItem(matches ? "OK" : "no match", "listC");
    }

    function AddItem(itemText, listID) {
        var li = document.createElement("li");
        li.innerHTML = itemText;
        document.getElementById(listID).appendChild(li);
    }   

</script>
</head>
<body onload="RunDemo()">
    <h2>Method A</h2>    
    <ul id="listA"></ul>

    <h2>Method B</h2> 
    <ul id="listB"></ul>

    <h2>Method C</h2> 
    <ul id="listC"></ul>
</body>
</html>
A: 

This still happens when I change the variable names to myRegexp1, myRegexp2 and myRegexp3, so your premise can't be correct. FWIW, Firefox 4 beta doesn't exhibit the problem, only Chrome.

Robusto
I am not saying that the Regular Expression objects are shared between the different functions, but just between the function CALLS to MethodC. The number of OK's outputted before a "no match" is outputted is equal to the number of "5" characters in the subject. I suspect it matches the first occurrence on the first call, then on the second call it matches the next occurrence of "5", and so on. It seems to remember the position of the previous match, which seems weird to me.
Jaap
Hmm ... are you saying it is remembering the lastIndex property of the regex then?
Robusto
How weird. [The number of 5s is definitely affecting the results in Chrome (jsFiddle)](http://jsfiddle.net/NNRjh/).
Peter Ajtai
+2  A: 

This is somewhat mentioned in the Mozilla Javascript Reference:

If your regular expression uses the "g" flag, you can use the exec method multiple times to find successive matches in the same string. When you do so, the search starts at the substring of str specified by the regular expression's lastIndex property

https://developer.mozilla.org/en/JavaScript/Reference/Global_Objects/RegExp/exec#Description

I don't understand though why the lastIndex property is kept after leaving MethodC

EDIT: I found this bug which seems to describe excatly what you experienced here: https://bugzilla.mozilla.org/show_bug.cgi?id=98409

slosd
That last bit is what I am trying to find out :)
Jaap
Because a JavaScript function is just another object and the regex is not being recreated after the function is "constructed". In Method A, the new method is being explicitly called, a new regular expression is evaluated and constructed, then being assigned to the myRegexp variable (property). In Method B, the match is exhausted on every run-through. In Method C, the JS engine can look at the assignment to myRegexp and say "yeah, I got that already" until the match is exhausted. (And you can watch the index in a debugger to see it in action -- A and B both hit on 6 every iteration.)
Stan Rogers
I'm not sure that the bug I added to my answer is really what's causing this since it was already fixed months ago...
slosd
Probably didn't make it into whichever builds the OP is using yet. Chrome was doing the same thing last I checked (which wasn't all that recently, I admit).
no
+2  A: 

The lastIndex property is acting as a static variable: in Chrome (&FF): alt text


but not IE: alt text

As to why this is happening, I'm not sure. You can work around this by using .match()

function MethodC(subject, iteration) {
    var myRegexp = /5/g;
    var matches = subject.match(myRegexp);
    AddItem(matches ? "OK" : "no match", "listC");
}
Peter Ajtai
+2  A: 

The optimizers in V8 and spidermonkey create a regex object when they see a regex literal and reuse it.

Per ECMA3, this is compliant behavior, but it will become non-compliant in ECMA5.

7.8.5 Regular Expression Literals

A regular expression literal is an input element that is converted to a RegExp object (section 15.10) when it is scanned. The object is created before evaluation of the containing program or function begins. Evaluation of the literal produces a reference to that object; it does not create a new object. Two regular expression literals in a program evaluate to regular expression objects that never compare as === to each other even if the two literals' contents are identical. A RegExp object may also be created at runtime by new RegExp (section 15.10.4) or calling the RegExp constructor as a function (section 15.10.3).

ECMAScript Language Specification Edition 3

Compare to:

7.8.5 Regular Expression Literals

A regular expression literal is an input element that is converted to a RegExp object (see 15.10) each time the literal is evaluated. Two regular expression literals in a program evaluate to regular expression objects that never compare as === to each other even if the two literals' contents are identical. A RegExp object may also be created at runtime by new RegExp (see 15.10.4) or calling the RegExp constructor as a function (15.10.3).

ECMAScript Language Specification Edition 5

Here are some workarounds:

  • Don't use the /g flag with exec.
  • Create a RegExp from the RegExp consructor instead of from a regexp literal.

Doing either or both of these should, I think, make the problem go away.

no
Thanks for the workarounds. Can you provide a link that confirms your statement about the reuse of RegEx objects created from literals?
Jaap
Sure, I updated my answer. Looks like I had the specs backwards; the 3rd edition actually did specify this behavior, so the weird alternating matches are actually per standard... At least, until ES5 is finalized.
no