Currently I use this reg ex:
"\bI([ ]{1,2})([a-zA-Z]|\d){2,13}\b"
It was just brought to my attention that the text that I use this against could contain a "\
" (backslash). How do I add this to the expression?
Thanks
Currently I use this reg ex:
"\bI([ ]{1,2})([a-zA-Z]|\d){2,13}\b"
It was just brought to my attention that the text that I use this against could contain a "\
" (backslash). How do I add this to the expression?
Thanks
This expression could be simplified if you're also allowing the underscore character in the second capture register, and you are willing to use metacharacters. That changes this:
([a-zA-Z]|\d){2,13}
into this ...
([\w]{2,13})
and you can also add a test for the backslash character with this ...
([\w\x5c]{2,13})
which makes the regex just a tad easier to eyeball, depending on your personal preference.
"\bI([\x20]{1,2})([\w\x5c]{2,13})\b"
See also:
Both @slavy13 and @dreftymac give you the basic solution with pointers, but...
\d
inside a character class to mean a digit.[:alpha:]
inside a character class to mean an alpha character, [:digit:]
to mean a digit, and [:alnum:]
to mean an alphanumeric (specifically not including underscore, unlike \w
). Note that these character classes might mean more characters than you expect; think of accented characters and non-arabic digits, especially in Unicode.Contrast the behaviour of these two one-liners:
perl -n -e 'print "$2\n" if m/\bI( {1,2})([a-zA-Z\d\\]){2,13}\b/'
perl -n -e 'print "$2\n" if m/\bI( {1,2})([a-zA-Z\d\\]{2,13})\b/'
Given the input line "I a123
", the first prints "3" and the second prints "a123". Obviously, if all you wanted was the last character of the second part of the string, then the original expression is fine. However, that is unlikely to be the requirement. (Obviously, if you're only interested in the whole lot, then using '$&
' gives you the matched text, but it has negative efficiency implications.)
I'd probably use this regex as it seems clearest to me:
m/\bI( {1,2})([[:alnum:]\\]{2,13})\b/
Time for the obligatory plug: read Jeff Friedl's "Mastering Regular Expressions".
As I pointed out in my comment to slavy's post, \\
-> \b
as a backslash is not a word character. So my suggestion is
/\bI([ ]{1,2})([\p{IsAlnum}\\]{2,13})(?:[^\w\\]|$)/
I assumed that you wanted to capture the whole 2-13 characters, not just the first one that applies, so I adjusted my RE.
You can make the last capture a lookahead if the engine supports it and you don't want to consume it. That would look like:
/\bI([ ]{1,2})([\p{IsAlnum}\\]{2,13})(?=[^\w\\]|$)/