Quite a simple question, but I'm a python noobie. How do we remove all non-numeric characters from a string in Python?
How can I do that?
Quite a simple question, but I'm a python noobie. How do we remove all non-numeric characters from a string in Python?
How can I do that?
Not sure if this is the most efficient way, but:
>>> ''.join(c for c in "abc123def456" if c.isdigit())
'123456'
The ''.join
part means to combine all the resulting characters together without any characters in between. Then the rest of it is a list comprehension, where (as you can probably guess) we only take the parts of the string that match the condition isdigit
.
>>> import re
>>> re.sub("[^0-9]", "", "sdkjh987978asd098as0980a98sd")
'987978098098098'
Fastest approach, if you need to perform more than just one or two such removal operations (or even just one, but on a very long string!-), is to rely on the translate
method of strings, even though it does need some prep:
>>> import string
>>> allchars = ''.join(chr(i) for i in xrange(256))
>>> identity = string.maketrans('', '')
>>> nondigits = allchars.translate(identity, string.digits)
>>> s = 'abc123def456'
>>> s.translate(identity, nondigits)
'123456'
The translate
method is different, and maybe a tad simpler simpler to use, on Unicode strings than it is on byte strings, btw:
>>> unondig = dict.fromkeys(xrange(65536))
>>> for x in string.digits: del unondig[ord(x)]
...
>>> s = u'abc123def456'
>>> s.translate(unondig)
u'123456'
You might want to use a mapping class rather than an actual dict, especially if your Unicode string may potentially contain characters with very high ord values (that would make the dict excessively large;-). For example:
>>> class keeponly(object):
... def __init__(self, keep):
... self.keep = set(ord(c) for c in keep)
... def __getitem__(self, key):
... if key in self.keep:
... return key
... return None
...
>>> s.translate(keeponly(string.digits))
u'123456'
>>>
This should work for strings and unicode objects:
# python <3.0
def only_numerics(seq):
return filter(type(seq).isdigit, seq)\
# python ≥3.0
def only_numerics(seq):
seq_type= type(seq)
return seq_type().join(filter(seq_type.isdigit, seq))