tags:

views:

524

answers:

4

Need help narrowing the scope of this Regex to not return records if there is an alphanumeric character preceding the first "I"

"I([ ]{1,2})([a-zA-Z]|\d){2,13}"

Want to capture I APF From this string, but not the I ARPT.

I APF                     'NAPLES MUNI ARPT.            ' 42894 JEB 29785584

Thanks!

+5  A: 

\b represents a word boundary in regular expressions, so the following should work (assuming you're happy with the rest of the regex):

("\bI([ ]{1,2})([a-zA-Z]|\d){2,13}")

A word boundary is defined as the zero-width space between a word character and a non-word character. Depending on your regex engine a word character is likely to be an alphanumeric character or an underscore, so using \b will match I ALF in -I ALF but not in _I ALF

Gareth
+2  A: 
\bI[ ]{1,2}[A-Za-z0-9]{2,13}
Ned Batchelder
A: 

You could try doing a negative look-behind:

(?<![a-zA-Z0-9])I([ ]{1,2})([a-zA-Z]|\d){2,13}

I'm not sure how widely it will work though (i.e. using different regex libraries)

Tom Haigh
+1  A: 

The word boundary seems like a good solution. You don't tell us what regex engine/language you will use: JavaScript doesn't have look-behind, for example.
And as the other pointed out, your expression is a bit too convoluted (which hurts speed).
My version would be:

\bI  ?[A-Za-z\d]{2,13}

With or without captures, depending on your needs. You might want to end the expression with \b too, to ensure there are no more alphanumeric characters after the expression.

PhiLho