ansaurus

Question

How do we remove all non-numeric characters from a string in Python?

Answer 1

+9 A:

Not sure if this is the most efficient way, but:

>>> ''.join(c for c in "abc123def456" if c.isdigit())
'123456'

The ''.join part means to combine all the resulting characters together without any characters in between. Then the rest of it is a list comprehension, where (as you can probably guess) we only take the parts of the string that match the condition isdigit.

Mark Rushakoff 2009-08-08 17:16:55

That does the opposite. I think you mean "not c.isdigit()"

Ryan Rosario 2009-08-08 17:19:51

Remove all non-numeric == keep only numeric.

Mark Rushakoff 2009-08-08 17:21:43

Answer 2

+7 A:

>>> import re
>>> re.sub("[^0-9]", "", "sdkjh987978asd098as0980a98sd")
'987978098098098'

Ned Batchelder 2009-08-08 17:25:21

that could be re.sub(r"\D", "", "sdkjh987978asd098as0980a98sd")

newacct 2009-08-08 19:07:25

Answer 3

+2 A:

Fastest approach, if you need to perform more than just one or two such removal operations (or even just one, but on a very long string!-), is to rely on the translate method of strings, even though it does need some prep:

>>> import string
>>> allchars = ''.join(chr(i) for i in xrange(256))
>>> identity = string.maketrans('', '')
>>> nondigits = allchars.translate(identity, string.digits)
>>> s = 'abc123def456'
>>> s.translate(identity, nondigits)
'123456'

The translate method is different, and maybe a tad simpler simpler to use, on Unicode strings than it is on byte strings, btw:

>>> unondig = dict.fromkeys(xrange(65536))
>>> for x in string.digits: del unondig[ord(x)]
... 
>>> s = u'abc123def456'
>>> s.translate(unondig)
u'123456'

You might want to use a mapping class rather than an actual dict, especially if your Unicode string may potentially contain characters with very high ord values (that would make the dict excessively large;-). For example:

>>> class keeponly(object):
...   def __init__(self, keep): 
...     self.keep = set(ord(c) for c in keep)
...   def __getitem__(self, key):
...     if key in self.keep:
...       return key
...     return None
... 
>>> s.translate(keeponly(string.digits))
u'123456'
>>>

Alex Martelli 2009-08-08 17:35:59

(1) Don't hard-code magic numbers; s/65536/sys.maxunicode/ (2) The dict is unconditionally "excessively large" because the input "may potentially" contain `(sys.maxunicode - number_of_non_numeric_chars)` entries. (3) consider whether string.digits may not be sufficient leading to a need to crack open the unicodedata module (4) consider re.sub(r'(?u)\D+', u'', text) for simplicity and potential speed.

John Machin 2009-08-08 23:31:19

Answer 4

+3 A:

This should work for strings and unicode objects:

# python <3.0
def only_numerics(seq):
    return filter(type(seq).isdigit, seq)\

# python ≥3.0
def only_numerics(seq):
    seq_type= type(seq)
    return seq_type().join(filter(seq_type.isdigit, seq))

ΤΖΩΤΖΙΟΥ 2009-09-07 03:01:27

and only in python 2.x

SilentGhost 2009-09-07 09:09:49

Thank you for the reminder, SilentGhost.

ΤΖΩΤΖΙΟΥ 2009-09-08 01:02:57

ansaurus

tags:

views:

answers:

How do we remove all non-numeric characters from a string in Python?

related questions