views:

113

answers:

5

i have a list:

a = ['a','b','c'.........'A','B','C'.........'Z']

and i have string:

string1= 's#$%ERGdfhliisgdfjkskjdfW$JWLI3590823r'

i want to keep ONLY those characters in string1 that exist in a

what is the most effecient way to do this? perhaps instead of having a be a list, i should just make it a string? like this a='abcdefg..........ABC..Z' ??

+4  A: 
''.join([s for s in string1 if s in a])

Explanation:

[s for s in string1 if s in a]

creates a list of all characters in string1, but only if they are also in the list a.

''.join([...])

turns it back into a string by joining it with nothing ('') in between the elements of the given list.

Ofri Raviv
+1 for synchronicity!
jathanism
sorry can u explain how this works chaver?
I__
added explanation.
Ofri Raviv
@I__ you should be aware that this becomes noticeably slow if `a` is sufficiently large. You should make it a set or go for the `re` method I've presented. `re` method is usually the fastest if `a` is large.
OTZ
+2  A: 

List comprehension to the rescue!

wanted = ''.join(letter for letter in string1 if letter in a)

(Note that when passing a list comprehension to a function you can omit the brackets so that the full list isn't generated prior to being evaluated. While semantically the same as a list comprehension, this is called a generator expression.)

jathanism
If you omit the square brackets, it's a generator expression. So you should link to generator expressions instead.
gnibbler
Good call. Done and done!
jathanism
+5  A: 

This should be faster.

>>> import re
>>> string1 = 's#$%ERGdfhliisgdfjkskjdfW$JWLI3590823r'
>>> a = ['E', 'i', 'W']
>>> r = re.compile('[^%s]+' % ''.join(a))
>>> print r.sub('', string1)
EiiWW

This is even faster than that.

>>> all_else = ''.join( chr(i) for i in range(256) if chr(i) not in set(a) )
>>> string1.translate(None, all_else)
'EiiWW'

44 microsec vs 13 microsec on my laptop.

How about that?

(Edit: turned out, translate yields the best performance.)

OTZ
+1  A: 

If, you are going to do this with large strings, there is a faster solution using translate; see this answer.

katrielalex
A: 

@katrielalex: To spell it out:

import string 
string1= 's#$%ERGdfhliisgdfjkskjdfW$JWLI3590823r'

non_letters= ''.join(chr(i) for i in range(256) if chr(i) not in string.letters)
print string1.translate(None,non_letters)

print 'Simpler, but possibly less correct'
print string1.translate(None, string.punctuation+string.digits+string.whitespace)
Tony Veijalainen