views:

152

answers:

4

I am getting different results based on whether I precompile a regular expression:

>>> re.compile('mr', re.IGNORECASE).sub('', 'Mr Bean')
' Bean'
>>> re.sub('mr', '', 'Mr Bean', re.IGNORECASE)
'Mr Bean'

The Python documentation says Some of the functions are simplified versions of the full featured methods for compiled regular expressions. However it also claims RegexObject.sub() is Identical to the sub() function.

So what is going on here?

+5  A: 

the module level sub() call doesn't accept modifiers at the end. thats the "count" argument - the maximum number of pattern occurrences to be replaced.

zzzeek
+11  A: 

re.sub() can't accept the re.IGNORECASE, it appears.

The documentation states:

sub(pattern, repl, string, count=0)

Return the string obtained by replacing the leftmost
non-overlapping occurrences of the pattern in string by the
replacement repl.  repl can be either a string or a callable;
if a string, backslash escapes in it are processed.  If it is
a callable, it's passed the match object and must return
a replacement string to be used.

Using this works in its place, however:

re.sub("(?i)mr", "", "Mr Bean")
Evan Fosmark
+4  A: 
>>> help(re.sub)
  1 Help on function sub in module re:
  2 
  3 sub(pattern, repl, string, count=0)
  4     Return the string obtained by replacing the leftmost
  5     non-overlapping occurrences of the pattern in string by the
  6     replacement repl.  repl can be either a string or a callable;
  7     if a callable, it's passed the match object and must return
  8     a replacement string to be used.

There is no function parameter in re.sub for regex flags (IGNORECASE, MULTILINE, DOTALL) as in re.compile.

Alternatives:

>>> re.sub("[M|m]r", "", "Mr Bean")
' Bean'

>>> re.sub("(?i)mr", "", "Mr Bean")
' Bean'


Edit Python 3.1 added support for regex flags, http://docs.python.org/3.1/whatsnew/3.1.html. As of 3.1 the signature of e.g. re.sub looks like:

re.sub(pattern, repl, string[, count, flags])
The MYYN
+2  A: 

From the Python 2.6.4 documentation:

re.sub(pattern, repl, string[, count])

re.sub() doesn't take a flag to set the regex mode. If you want re.IGNORECASE, you must use re.compile().sub()

Chinmay Kanchi