tags:

views:

120

answers:

7

I'm looking at a regular expression in our source code, and I'm unsure if it's correct.

<<SWE.[^<<]*>>

Specifically, what does the [^<<] part do? I thought the brackets allowed you to supply a range. Does it exclude "<<" or just a single "<" ?

If this was a line of text being parsed:

<<SWE.SomeText>><<SWE.SomeMoreText>>

I think the author's intent is to have two matches instead of one.

match[0] = <<SWE.SomeText>>
and
match[1] = <<SWE.SomeMoreText>>

instead of

match[0] = <<SWE.SomeText>><<SWE.SomeMoreText>>

Is there a better way? What about <<SWE.*?>>

A: 

RegexBuddy says:

Match any character that is not a “<” «[^<<]*»

Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»

ip
+2  A: 

I think the expression you want is:

<<SWE\.[^>]*>>

That will match the two examples you gave.

Phil M
+4  A: 

It looks to me like wishful thinking. [^<<] will match anything other than a <. Whether it's << or < is irrelevant. [^<] and [^<<] mean the same thing.

Daniel Straight
A: 

[^<<] is equivalent to [^<], listing a character twice in a character class is redundant. It should also be [^>] with a right angle bracket, I would think. Also the dot should be escaped as "\.".

I agree with your regex: <<SWE\..*?>> is better. If it matters, though, the non-greedy operator could cause unanticipated backtracking in a non-matching string whereas [^>]* would not involve any backtracking and so could be more efficient.

John Kugelman
`[^>]*` still could involve backtracking if the part that comes after it can't match. To really kill backtracking you should make it possessive, like this `[^>]*+`.
Geert
A: 

You're right [^<<] does only exclude a single <; the second < is redundant.

It certainly appears that the original intent is to make sure that the >> at the end of the pattern isn't greedy, and the better way to do that is to use *? instead of *, as in your final pattern, <<SWE.*?>>.

One thing to note, it looks like you want the prefix within the tags to be "SWE.", so you should add an escaped period to the pattern, in addition to the "any charater" period. Thus:

<<SWE\..*?>>
bdukes
A: 

What flavor of regex are you using?

If you're using something exotic, '<<' and '>>' could stand for word boundaries, inside and outside of the character class.

rooskie
No, a character class always matches exactly one character. The word boundary construct is a zero-width assertion--it doesn't consume any characters.
Alan Moore
A: 

[^ is a negated character class--match things that are NOT these characters.

This matches the first:
<<SWE.*?>>

This matches one or more:
(?:<<SWE.*?>>)+

This matches everything between << and the last >> (including more >>'s):
<<SWE.*>>

steamer25