tags:

views:

38

answers:

2

I have the following regular expression, which I think should match any character that is not alphanumeric, '!', '?', or '.'

re.compile('[^A-z ?!.]')

However, I get the following weird result in iPython:

In [21]: re.sub(a, ' ', 'Hey !$%^&*.#$%^&.')
Out[21]: 'Hey !  ^  .   ^ .'

The result is the same when I escape the '.' in the regular expression.

How do I match the caret so that it is removed from the string as well?

+3  A: 

You have an error in your regular expression. Note that the case of the a and z is important. A-z includes all characters between ASCII value 65 (A) and 122 (Z), which includes the caret character (ASCII code 94).

Try this instead:

re.compile('[^A-Za-z ?!.]')

Example:

import re
regex = re.compile('[^A-Za-z ?!.]')
result = regex.sub(' ', 'Hey !$%^&*.#$%^&.')
print result

Result:

Hey !     .     .
Mark Byers
A: 

The caret falls between the upper and lower cases in ASCII. You need [^a-zA-Z ?!\.]

Bob