



I'm trying to delete all digits from a string. However the next code deletes as well digits contained in any word, and obviously I don't want that. I've been trying many regular expressions with no success.


s = "This must not b3 delet3d, but the number at the end yes 134411"
s = re.sub("\d+", "", s)
print s


This must not b deletd, but the number at the end yes

+1  A: 

Add a space before the \d+.

>>> s = "This must not b3 delet3d, but the number at the end yes 134411"
>>> s = re.sub(" \d+", " ", s)
>>> s
'This must not b3 delet3d, but the number at the end yes '

Edit: After looking at the comments, I decided to form a more complete answer. I think this accounts for all the cases.

s = re.sub("^\d+\s|\s\d+\s|\s\d+$", " ", s)
Oh, thanks, it worked!
What about strings such as " 3at"?
Here's another 2 cases for your unit tests: '123 should be deleted.' and 'You have been 0wn3d'
Paul Hankin
'123, foo' still fails
+1  A: 

If your number is allways at the end of your strings try : re.sub("\d+$", "", s)

otherwise, you may try re.sub("(\s)\d+(\s)", "\1\2", s)

You can adjust the back-references to keep only one or two of the spaces (\s match any white separator)

Raoul Supercopter
\W is probably better than \s for this.Also, a better variation would be "\b\d+\b" except that it fails to work for me!
+9  A: 

Try this:


That'll match only those digits that are not part of another word.

This doesn't delete the first or last numbers for, s = s = "1234 This must not b3 delet3d, 123 but the number at the end yes 134411"
I just tested it with your string and I got the expected result. \b matches either the beginning of the string, the end, or anything that isn't a word character ([A-Za-z0-9_]). I tested it in IronPython though, don't know if there's something wrong with Python's handling of word boundaries
I haven't tried this, but could you do something like: [^\b]\d+[$\b]
sharth: that's essentially the same. \b will match at the beginning or end of the string already. It's a "null pattern" that matches "between" a word and a non-word. So re.sub(r'\b', '!', 'one two') gives "!one! !two!"

To handle digit strings at the beginning of a line as well:

s = re.sub(r"(^|\W)\d+", "", s)
Lance Richardson
+1  A: 

Using \s isn't very good, since it doesn't handle tabs, et al. A first cut at a better solution is:

re.sub(r"\b\d+\b", "", s)

Note that the pattern is a raw string because \b is normally the backspace escape for strings, and we want the special word boundary regex escape instead. A slightly fancier version is:

re.sub(r"$\d+\W+|\b\d+\b|\W+\d+$", "", s)

That tries to remove leading/trailing whitespace when there are digits at the beginning/end of the string. I say "tries" because if there are multiple numbers at the end then you still have some spaces.


Non-regex solution:

>>> s = "This must not b3 delet3d, but the number at the end yes 134411"
>>> " ".join([x for x in s.split(" ") if not x.isdigit()])
'This must not b3 delet3d, but the number at the end yes'

Splits by " ", and checks if the chunk is a number by doing str().isdigit(), then joins them back together. More verbosely (not using a list comprehension):

words = s.split(" ")
non_digits = []
for word in words:
    if not word.isdigit():

" ".join(non_digits)
+1  A: 

I don't know what your real situation looks like, but most of the answers look like they won't handle negative numbers or decimals,


The above should also handle things like,

"This must not b3 delet3d, but the number at the end yes -134.411"

But this is still incomplete - you probably need a more complete definition of what you can expect to find in the files you need to parse.

Edit: it's also worth noting that '\b' changes depending on the locale/character set you are using so you need to be a little careful with that.
