tags:

views:

2189

answers:

3

I wanted to cut up a string of email addresses which may be separated by any combination of commas and white-space.

And I thought it would be pretty straight-forward :

sep = re.compile('(\s*,*)+')
print sep.split("""[email protected], [email protected]

   [email protected],,[email protected]""")

But it isn't. I can't find a regex that won't leave some empty slots like this :

['[email protected]', '', '[email protected]', '', '[email protected]', '', '[email protected]']

I've tried various combinations, but none seem to work. Is this, in fact, possible, with regex?

+8  A: 

Doh!

It's just this.

sep = re.compile('[\s,]+')
interstar
In Perl (probably in Python, due to the fact that it appears to be doing the same thing) using ()s in a regex when splitting causes the split() to preserve the match (between the parens), and return a list with the pattern match in between the items you want. So maybe don't use ()s in a split.
Chris Lutz
A: 

I like the following...

>>> sep= re.compile( r',*\s*' )
>>> sep.split("""[email protected], [email protected]

   [email protected],,[email protected]""")
['[email protected]', '[email protected]', '[email protected]', '[email protected]']

Which also seems to work on your test data.

S.Lott
+1: I don't know why this was down voted before, but it works quite nicely.
tgray
That regex will match the empty string, since it uses star quantifiers for everything. Really you want to split on at least one character; the OP's solution with a character class and plus quantifier is better, not to mention much clearer to read.
kquinn
I see. I don't think regular expressions can be ranked on readability, but I get your point about matching at least one character.
tgray
+2  A: 

without re

line = 'e@d , f@g, 7@g'

addresses = line.split(',')    
addresses = [ address.strip() for address in addresses ]
Mykola Golubyev