tags:

views:

166

answers:

2

Using Python module re, how to get the equivalent of the "\w" (which matches alphanumeric chars) WITHOUT matching the numeric characters (those which can be matched by "[0-9]")?

Notice that the basic need is to match any character (including all unicode variation) without numerical chars (which are matched by "[0-9]").

As a final note, I really need a regexp as it is part of a greater regexp.

Underscores should not be matched.

EDIT: - I hadn't thought about underscores state, so thanks for warnings about this beeing matched by "\w" and for the elected solution that addresses this issue.

+4  A: 
(?!\d)\w

A position that is not followed by a digit, and then \w. Effectively cancels out digits but allows the \w range by using a negative look-ahead.

The same could be expressed as a positive look-ahead and \D:

(?=\D)\w

To match multiple of these, enclose in parens:

(?:(?!\d)\w)+
Tomalak
Don't forget that \w also contains the underscore.
Tim Pietzcker
The OP did say nothing about the underscore. How is that relevant?
Tomalak
Just in case the OP doesn't expect it. I like your solution.
Tim Pietzcker
+13  A: 

You want [^\W\d]: the group of characters that is not (either a digit or not an alphanumeric). Add an underscore in that negated set if you don't want them either.

A bit twisted, if you ask me, but it works. Should be faster than the lookahead alternative.

chrispy
Nice, +1 from me. Didn't think of that one.
Tomalak
great idea, that can be re-used with other patterns and other regex implementation.
vaab