views:

60

answers:

1

When i write a regular expression like:

m = /(s+).*?(l)[^l]*?(o+)/.exec( "this is hello to you" )
console.log( m );

I get a match object containing the following:

{
    0: "s is hello",
    1: "s",
    2: "l",
    3: "o",
    index: 3,
    input: "this is hello to you"
}

I know the index of the entire match from the 'index' property, but i also need to know the start and end of the groups matched. Using a simple search won't work. In this example it will find the first 'l' instead of the one found in the group.

Is there any way to get the offset of a matched group?

+1  A: 

You can't directly get the index of a match group. What you have to do is first put every character in a match group, even the ones you don't care about:

var m= /(s+)(.*?)(l)([^l]*?)(o+)/.exec('this is hello to you');

Now you've got the whole match in parts:

['s is hello', 's', ' is hel', 'l', '', 'o']

So you can add up the lengths of the strings before your group to get the offset from the match index to the group index:

function indexOfGroup(match, n) {
    var ix= match.index;
    for (var i= 1; i<n; i++)
        ix+= match[i].length;
    return ix;
}

console.log(indexOfGroup(m, 3)); // 11
bobince
Nice solution. But in my case I need to add the extra parens automatically. And fix backreferences if any + remember the original group numbers. It is for a syntax highlighter with scope matching, and the current solution is to use the halfdone highlighter to analyse the regexp syntax + doing all sorts of stuff to the abstract syntax tree afterwards. I would sure love a more simple solution, than incorporating the 300 lines of code.
Otey