Some Context
From Javascript: The Definitive Guide:
When
regexp
is a global regular expression, however,exec()
behaves in a slightly more complex way. It begins searchingstring
at the character position specified by thelastIndex
preperty ofregexp
. When it finds a match, it setslastIndex
to the position of the first character after the match.
I think anyone who works with javascript RegExps on a regular basis will recognize this passage. However, I have found a strange behavior in this method.
The Problem
Consider the following code:
>> rx = /^(.*)$/mg
>> tx = 'foo\n\nbar'
>> rx.exec(tx)
[foo,foo]
>> rx.lastIndex
3
>> rx.exec(tx)
[,]
>> rx.lastIndex
4
>> rx.exec(tx)
[,]
>> rx.lastIndex
4
>> rx.exec(tx)
[,]
>> rx.lastIndex
4
The RegExp seems to get stuck on the second line and doesn't increment the lastIndex
property. This seems to contradict The Rhino Book. If I set it myself as follows it continues and eventually returns null as expected but it seems like I shouldn't have to.
>> rx.lastIndex = 5
5
>> rx.exec(tx)
[bar,bar]
>> rx.lastIndex
8
>> rx.exec(tx)
null
Conclusion
Obviously I can increment the lastIndex
property any time the match is the empty string. However, being the inquisitive type, I want to know why it isn't incremented by the exec
method. Why isn't it?
Notes
I have observed this behavior in Chrome and Firefox. It seems to happen only when there are adjacent newlines.
[edit]
Tomalak says below that changing the pattern to /^(.+)$/gm
will cause the expression not to get stuck, but the blank line is ignored. Can this be altered to still match the line? Thanks for the answer Tomalak!
[edit]
Using the following pattern and using group 1 works for all strings I can think of. Thanks again to Tomalak.
/^(.*)((\r\n|\r|\n)|$)/gm
[edit]
The previous pattern returns the blank line. However, if you don't care about the blank lines, Tomalak gives the following solution, which I think is cleaner.
/^(.*)[\r\n]*/gm
[edit]
Both of the previous two solutions get stuck on trailing newlines, so you have to either strip them or increment lastIndex
manually.
[edit]
I found a great article detailing the cross browser issues with lastIndex
over at Flagrant Badassery. Besides the awesome blog name, the article gave me a much more in depth understanding of the issue along with a good cross browser solution. The solution is as follows:
var rx = /^/gm,
tx = 'A\nB\nC',
m;
while(m = rx.exec(tx)){
if(!m[0].length && rx.lastIndex > m.index){
--rx.lastIndex;
}
foo();
if(!m[0].length){
++rx.lastIndex;
}
}