ansaurus

Question

In Python, How Do You Filter a String Such That Only Characters in Your List Are Returned?

Answer 1

+2 A:

s = 'ASDjifjASFJ7364'
s_lowercase = ''.join(filter(lambda c: c.islower(), s))
print s_lowercase #print 'jifj'

jcoon 2009-05-15 19:53:32

There is no need to call list on s. String objects are iterable.

Ayman Hourieh 2009-05-15 20:03:39

Answer 2

+12 A:

s = 'Agh#$%#%2341- -!zdrkfd'  
print ''.join(c for c in s if c.islower())

String objects are iterable; there is no need to "explode" the string into a list. You can put whatever condition you want in the list comprehension, and it will filter characters accordingly.

You could also implement this using a regex, but this will only hide the loop. The regular expressions library will still have to loop through the characters of the string in order to filter them.

Ayman Hourieh 2009-05-15 19:53:36

isalpha() is not needed because non-alpha characters will return false on islower()

jcoon 2009-05-15 19:55:16

@coonj Good point. Fixed.

Ayman Hourieh 2009-05-15 19:58:51

This can also be modified to work with a custom character list by changing `c.islower()` to e.g. `c in "abcDEF"`.

Ben Blank 2009-05-15 22:14:01

Well darn it-I thought I had the better answer but this is simpler. Incorporating Ben Blank's comment makes the answer suitably general. I assumed I had to make my list first but not at all.

PyNEwbie 2009-05-16 01:39:58

Answer 3

+1 A:

I'd use a regex. For lowercase match [a-z].

Oli 2009-05-15 19:54:27

Answer 4

+3 A:

>>> s = 'Agh#$%#%2341- -!zdrkfd'
>>> ''.join(i for i in s if  i in 'qwertyuiopasdfghjklzxcvbnm')
'ghzdrkfd'

Nixuz 2009-05-15 19:57:34

Answer 5

+4 A:

Using a regular expression is easy enough, especially for this scenario:

>>> import re
>>> s = 'ASDjifjASFJ7364'
>>> re.sub(r'[^a-z]+', '', s)
'jifj'

If you plan on doing this many times, it is best to compile the regular expression before hand:

>>> import re
>>> s = 'ASDjifjASFJ7364'
>>> r = re.compile(r'[^a-z]+')
>>> r.sub('', s)
'jifj'

Paolo Bergantino 2009-05-15 19:58:27

To be fair I ran the test again on your pre-compiled version and it is still slower than translate.

Nadia Alramli 2009-05-15 22:33:49

The regex should be '[^a-z]+' - this significantly improves performance.

gnud 2009-05-15 23:12:51

@gnud, you are right about improving performance. But it is still much slower than translate.

Nadia Alramli 2009-05-15 23:27:33

Thanks, gnud, fixed.

Paolo Bergantino 2009-05-15 23:38:25

Answer 6

A:

import string
print "".join([c for c in "Agh#$%#%2341- -!zdrkfd" if c in string.lowercase])

2009-05-15 20:04:29

Answer 7

+14 A:

If you are looking for efficiency. Using the translate function is the fastest you can get.

It can be used to quickly replace characters and/or delete them.

import string
delete_table  = string.maketrans(
    string.ascii_lowercase, ' ' * len(string.ascii_lowercase)
)
table = string.maketrans('', '')

"Agh#$%#%2341- -!zdrkfd".translate(table, delete_table)

In python 2.6: you don't need the second table anymore

import string
delete_table  = string.maketrans(
    string.ascii_lowercase, ' ' * len(string.ascii_lowercase)
)
"Agh#$%#%2341- -!zdrkfd".translate(None, delete_table)

This is method is way faster than any other. Of course you need to store the delete_table somewhere and use it. But even if you don't store it and build it every time, it is still going to be faster than other suggested methods so far.

To confirm my claims here are the results:

for i in xrange(10000):
    ''.join(c for c in s if c.islower())

real    0m0.189s
user    0m0.176s
sys 0m0.012s

While running the regular expression solution:

for i in xrange(10000):
    re.sub(r'[^a-z]', '', s)

real    0m0.172s
user    0m0.164s
sys 0m0.004s

[Upon request] If you pre-compile the regular expression:

r = re.compile(r'[^a-z]')
for i in xrange(10000):
    r.sub('', s)

real    0m0.166s
user    0m0.144s
sys 0m0.008s

Running the translate method the same number of times took:

real    0m0.075s
user    0m0.064s
sys 0m0.012s

Nadia Alramli 2009-05-15 20:35:58

To be fair you should compile the regex outside the loop.

Unknown 2009-05-15 21:10:16

I'm comparing the top suggested solutions. That's how Paolo Bergantino wrote his expression.

Nadia Alramli 2009-05-15 21:13:00

I wrote it as a one-off solution, it would obviously be best compiled, so you should compare it as such.

Paolo Bergantino 2009-05-15 22:07:47

I ran the test again with a pre-compiled expression. As you can see it is still more than 2 times slower than translate

Nadia Alramli 2009-05-15 22:32:25

In order to be fair, change the regex to '[^a-z]+'. That way, it will replace series of matches in one go, instead of one character at the time.

gnud 2009-05-15 23:12:22

@gnud, I tried that, it is a little faster but no match to translate. By the way the larger the string is, the bigger the difference in performance between translate and other methods. The processing time hardly grow with string length in translate.

Nadia Alramli 2009-05-15 23:25:46

Answer 8

A:

Here's one solution if you are specifically interested in working on strings:

 s = 'Agh#$%#%2341- -!zdrkfd'
 lowercase_chars = [chr(i) for i in xrange(ord('a'), ord('z') + 1)]
 whitelist = set(lowercase_chars)
 filtered_list = [c for c in s if c in whitelist]

The whitelist is actually a set (not a list) for efficiency.

If you need a string, use join():

filtered_str = ''.join(filtered_list)

filter() is a more generic solution. From the documentation (http://docs.python.org/library/functions.html):

filter(function, iterable)

Construct a list from those elements of iterable for which function returns true. iterable may be either a sequence, a container which supports iteration, or an iterator. If iterable is a string or a tuple, the result also has that type; otherwise it is always a list. If function is None, the identity function is assumed, that is, all elements of iterable that are false are removed.

This would be one way of using filter():

filtered_list = filter(lambda c: c.islower(), s)

2009-05-15 20:59:53

ansaurus

tags:

views:

answers:

In Python, How Do You Filter a String Such That Only Characters in Your List Are Returned?

related questions