views:

99

answers:

3

I have a RegExp like the following simplified example:

var exp = /he|hell/;

When I run it on a string it will give me the first match, fx:

var str = "hello world";
var match = exp.exec(str);
// match contains ["he"];

I want the first and longest possible match, and by that i mean sorted by index, then length.

Since the expression is combined from an array of RegExp's, I am looking for a way to find the longest match without having to rewrite the regular expression.

Is that even possible?

If it isn't, I am looking for a way to easily analyze the expression, and arrange it in the proper order. But I can't figure out how since the expressions could be a lot more complex, fx:

var exp = /h..|hel*/
A: 

How about /hell|he/ ?

S.Mark
+1  A: 

All regex implementations I know of will (try to) match characters/patterns from left to right and terminate whenever they find an over-all match.

In other words: if you want to make sure you get the longest possible match, you'll need to try all your patterns (separately), store all matches and then get the longest match from all possible matches.

Bart Kiers
I know. I edited the question. Thanks for the answer. I will start by finding the index of the first match, and then ad the ^ to each RegExp and search the substring starting from first index, since looking for expressions that aren't there, requres running through all the text.
Otey
+1  A: 

You cannot do "longest match" (or anything involving counting, minus look-aheads) with regular expressions.

Your best bet is to find all matches, and simply compare the lengths in the program.

BlueRaja - Danny Pflughoeft