ansaurus

Question

Answer 1

+10 A:

''.join(c for c in S if c.isdigit())

Ignacio Vazquez-Abrams 2010-09-04 16:21:35

I hate it when people slam a list comprehension when a generator expression is the right option. You, sir, win one internet (and an upvote).

delnan 2010-09-04 16:31:19

+1 for non-regex solution.i hate when peoples use regex for everything,some goals is better suited with non-regex solution

killown 2010-09-04 20:21:56

@killown: I personally find the regex option more readable. Maybe it's because I've done so much work with regexes, but looking at @KennyTM's code I can immediately see what it does; this one takes me a second to understand.

musicfreak 2010-09-19 09:29:45

Answer 2

+11 A:

It is possible with regex.

import re

...

return re.sub(r'\D', '', theString)

KennyTM 2010-09-04 16:22:31

Answer 3

A:

Although a little more complicated to set up, using the translate() string method to delete the characters as shown below can as much as 4-6 times faster than using join() or re.sub() according to timing tests I performed -- so if it is something done many times, you might want to consider using this instead.

nonnumerics = ''.join(c for c in ''.join(chr(i) for i in range(256)) if not c.isdigit())

astring = '123-$ab #6789'
print astring.translate(None, nonnumerics)
# 1236789

martineau 2010-09-04 19:30:19

The str.translate method is preferred over string.translate.

Roger Pate 2010-09-18 20:04:02

@Roger Pate, good point, old habits are hard to break... I've updated the code in my answer accordingly. Thanks for pointing it out and allowing me to improve the sample code.

martineau 2010-09-19 07:37:14

Answer 4

+1 A:

I prefer regular expressions, so here's a way if you like

import re
myStr = '$334fdf890==-'
digts = re.sub('[^0-9]','',myStr)

This should replace all nonnumeric occurences with '' i.e. with nothing. So digts variable should be '334890'

reddy 2010-09-05 18:19:14

Answer 5

A:

Let's time the join and the re versions:

In [3]: import re

In [4]: def withRe(theString): return re.sub('\D', '', theString)
   ...:

In [5]:

In [6]: def withJoin(S): return ''.join(c for c in S if c.isdigit())
   ...:


In [11]: s = "8-4545-225-144"

In [12]: %timeit withJoin(s)
100000 loops, best of 3: 6.89 us per loop

In [13]: %timeit withRe(s)
100000 loops, best of 3: 4.77 us per loop

The join version is much nicer, compared to the re one, but unfortunately is 50% slower. So if the performance is an issue, the elegance might need to be sacrificed.

EDIT

In [16]: def withFilter(s): return filter(str.isdigit, s)
   ....:
In [19]: %timeit withFilter(s)
100000 loops, best of 3: 2.75 us per loop

It looks like filter is the performance and readability winner

bgbg 2010-09-19 08:03:07

Answer 6

+2 A:

filter(str.isdigit, s) is faster and IMO clearer than anything else listed here.

It will also throw a TypeError if s is a unicode type. Depending on what definition of "digits" you want, this can be more or less useful than the alternative filter(type(s).isdigit, s), slightly slower but still faster than the re and comprehension versions for me.

Edit: Although if you are a poor sucker stuck with Python 3, you will need to use "".join(filter(str.isdigit, s)) which puts you firmly in the realm of equivalently bad performance. Such progress.

Joe 2010-09-19 08:34:34

ansaurus

tags:

views:

answers:

Replace non-numeric characters

related questions