views:

135

answers:

6

I need to replace non-numeric chars from a string.

For example, "8-4545-225-144" needs to be "84545225144"; "$334fdf890==-" must be "334890".

How can I do this?

+10  A: 
''.join(c for c in S if c.isdigit())
Ignacio Vazquez-Abrams
I hate it when people slam a list comprehension when a generator expression is the right option. You, sir, win one internet (and an upvote).
delnan
+1 for non-regex solution.i hate when peoples use regex for everything,some goals is better suited with non-regex solution
killown
@killown: I personally find the regex option more readable. Maybe it's because I've done so much work with regexes, but looking at @KennyTM's code I can immediately see what it does; this one takes me a second to understand.
musicfreak
+11  A: 

It is possible with regex.

import re

...

return re.sub(r'\D', '', theString)
KennyTM
A: 

Although a little more complicated to set up, using the translate() string method to delete the characters as shown below can as much as 4-6 times faster than using join() or re.sub() according to timing tests I performed -- so if it is something done many times, you might want to consider using this instead.

nonnumerics = ''.join(c for c in ''.join(chr(i) for i in range(256)) if not c.isdigit())

astring = '123-$ab #6789'
print astring.translate(None, nonnumerics)
# 1236789
martineau
The str.translate method is preferred over string.translate.
Roger Pate
@Roger Pate, good point, old habits are hard to break... I've updated the code in my answer accordingly. Thanks for pointing it out and allowing me to improve the sample code.
martineau
+1  A: 

I prefer regular expressions, so here's a way if you like

import re
myStr = '$334fdf890==-'
digts = re.sub('[^0-9]','',myStr) 

This should replace all nonnumeric occurences with '' i.e. with nothing. So digts variable should be '334890'

reddy
A: 

Let's time the join and the re versions:

In [3]: import re

In [4]: def withRe(theString): return re.sub('\D', '', theString)
   ...:

In [5]:

In [6]: def withJoin(S): return ''.join(c for c in S if c.isdigit())
   ...:


In [11]: s = "8-4545-225-144"

In [12]: %timeit withJoin(s)
100000 loops, best of 3: 6.89 us per loop

In [13]: %timeit withRe(s)
100000 loops, best of 3: 4.77 us per loop

The join version is much nicer, compared to the re one, but unfortunately is 50% slower. So if the performance is an issue, the elegance might need to be sacrificed.

EDIT

In [16]: def withFilter(s): return filter(str.isdigit, s)
   ....:
In [19]: %timeit withFilter(s)
100000 loops, best of 3: 2.75 us per loop

It looks like filter is the performance and readability winner

bgbg
+2  A: 

filter(str.isdigit, s) is faster and IMO clearer than anything else listed here.

It will also throw a TypeError if s is a unicode type. Depending on what definition of "digits" you want, this can be more or less useful than the alternative filter(type(s).isdigit, s), slightly slower but still faster than the re and comprehension versions for me.

Edit: Although if you are a poor sucker stuck with Python 3, you will need to use "".join(filter(str.isdigit, s)) which puts you firmly in the realm of equivalently bad performance. Such progress.

Joe