I need to replace non-numeric chars from a string.
For example, "8-4545-225-144" needs to be "84545225144"; "$334fdf890==-" must be "334890".
How can I do this?
I need to replace non-numeric chars from a string.
For example, "8-4545-225-144" needs to be "84545225144"; "$334fdf890==-" must be "334890".
How can I do this?
It is possible with regex.
import re
...
return re.sub(r'\D', '', theString)
Although a little more complicated to set up, using the translate()
string method to delete the characters as shown below can as much as 4-6 times faster than using join()
or re.sub()
according to timing tests I performed -- so if it is something done many times, you might want to consider using this instead.
nonnumerics = ''.join(c for c in ''.join(chr(i) for i in range(256)) if not c.isdigit())
astring = '123-$ab #6789'
print astring.translate(None, nonnumerics)
# 1236789
I prefer regular expressions, so here's a way if you like
import re
myStr = '$334fdf890==-'
digts = re.sub('[^0-9]','',myStr)
This should replace all nonnumeric occurences with '' i.e. with nothing. So digts variable should be '334890'
Let's time the join
and the re
versions:
In [3]: import re
In [4]: def withRe(theString): return re.sub('\D', '', theString)
...:
In [5]:
In [6]: def withJoin(S): return ''.join(c for c in S if c.isdigit())
...:
In [11]: s = "8-4545-225-144"
In [12]: %timeit withJoin(s)
100000 loops, best of 3: 6.89 us per loop
In [13]: %timeit withRe(s)
100000 loops, best of 3: 4.77 us per loop
The join
version is much nicer, compared to the re
one, but unfortunately is 50% slower. So if the performance is an issue, the elegance might need to be sacrificed.
EDIT
In [16]: def withFilter(s): return filter(str.isdigit, s)
....:
In [19]: %timeit withFilter(s)
100000 loops, best of 3: 2.75 us per loop
It looks like filter
is the performance and readability winner
filter(str.isdigit, s)
is faster and IMO clearer than anything else listed here.
It will also throw a TypeError if s is a unicode type. Depending on what definition of "digits" you want, this can be more or less useful than the alternative filter(type(s).isdigit, s)
, slightly slower but still faster than the re and comprehension versions for me.
Edit: Although if you are a poor sucker stuck with Python 3, you will need to use "".join(filter(str.isdigit, s))
which puts you firmly in the realm of equivalently bad performance. Such progress.