ansaurus

Question

Answer 1

+1 A:

Add a space before the \d+.

>>> s = "This must not b3 delet3d, but the number at the end yes 134411"
>>> s = re.sub(" \d+", " ", s)
>>> s
'This must not b3 delet3d, but the number at the end yes '

Edit: After looking at the comments, I decided to form a more complete answer. I think this accounts for all the cases.

s = re.sub("^\d+\s|\s\d+\s|\s\d+$", " ", s)

oneporter 2009-05-03 14:04:05

Oh, thanks, it worked!

Menda 2009-05-03 14:07:18

What about strings such as " 3at"?

marcog 2009-05-03 14:11:55

Here's another 2 cases for your unit tests: '123 should be deleted.' and 'You have been 0wn3d'

Paul Hankin 2009-05-03 14:22:01

'123, foo' still fails

marcog 2009-05-03 15:26:03

Answer 2

+1 A:

If your number is allways at the end of your strings try : re.sub("\d+$", "", s)

otherwise, you may try re.sub("(\s)\d+(\s)", "\1\2", s)

You can adjust the back-references to keep only one or two of the spaces (\s match any white separator)

Raoul Supercopter 2009-05-03 14:06:05

\W is probably better than \s for this.Also, a better variation would be "\b\d+\b" except that it fails to work for me!

dwc 2009-05-03 14:17:43

Answer 3

+9 A:

Try this:

"\b\d+\b"

That'll match only those digits that are not part of another word.

jrcalzada 2009-05-03 14:12:44

This doesn't delete the first or last numbers for, s = s = "1234 This must not b3 delet3d, 123 but the number at the end yes 134411"

oneporter 2009-05-03 14:39:28

I just tested it with your string and I got the expected result. \b matches either the beginning of the string, the end, or anything that isn't a word character ([A-Za-z0-9_]). I tested it in IronPython though, don't know if there's something wrong with Python's handling of word boundaries

jrcalzada 2009-05-03 15:33:37

I haven't tried this, but could you do something like: [^\b]\d+[$\b]

sharth 2009-05-03 15:33:44

sharth: that's essentially the same. \b will match at the beginning or end of the string already. It's a "null pattern" that matches "between" a word and a non-word. So re.sub(r'\b', '!', 'one two') gives "!one! !two!"

dwc 2009-05-03 15:41:24

Answer 4

A:

To handle digit strings at the beginning of a line as well:

s = re.sub(r"(^|\W)\d+", "", s)

Lance Richardson 2009-05-03 14:23:58

Answer 5

+1 A:

Using \s isn't very good, since it doesn't handle tabs, et al. A first cut at a better solution is:

re.sub(r"\b\d+\b", "", s)

Note that the pattern is a raw string because \b is normally the backspace escape for strings, and we want the special word boundary regex escape instead. A slightly fancier version is:

re.sub(r"$\d+\W+|\b\d+\b|\W+\d+$", "", s)

That tries to remove leading/trailing whitespace when there are digits at the beginning/end of the string. I say "tries" because if there are multiple numbers at the end then you still have some spaces.

dwc 2009-05-03 15:05:28

Answer 6

A:

Non-regex solution:

>>> s = "This must not b3 delet3d, but the number at the end yes 134411"
>>> " ".join([x for x in s.split(" ") if not x.isdigit()])
'This must not b3 delet3d, but the number at the end yes'

Splits by " ", and checks if the chunk is a number by doing str().isdigit(), then joins them back together. More verbosely (not using a list comprehension):

words = s.split(" ")
non_digits = []
for word in words:
    if not word.isdigit():
        non_digits.append(word)

" ".join(non_digits)

dbr 2009-05-03 15:21:27

Answer 7

+1 A:

I don't know what your real situation looks like, but most of the answers look like they won't handle negative numbers or decimals,

re.sub(r"(\b|\s+\-?|^\-?)(\d+|\d*\.\d+)\b","")

The above should also handle things like,

"This must not b3 delet3d, but the number at the end yes -134.411"

But this is still incomplete - you probably need a more complete definition of what you can expect to find in the files you need to parse.

Edit: it's also worth noting that '\b' changes depending on the locale/character set you are using so you need to be a little careful with that.

blackkettle 2009-05-03 15:37:32

ansaurus

tags:

views:

answers:

Delete digits in Python (Regex)

related questions