views:

4329

answers:

6

I need to be able to take a string like:

'''foo, bar, "one, two", three four'''

into:

['foo', 'bar', 'one, two', 'three four']

I have an feeling (with hints from #python) that the solution is going to involve the shlex module.

+5  A: 

You may also want to consider the csv module. I haven't tried it, but it looks like your input data is closer to CSV than to shell syntax (which is what shlex parses).

Greg Hewgill
Agreed. Minus the enclosing ''' portions, that looks like pretty standard CSV formatting. (Well, as much as it can, without a CSV standard.)
jdmichal
@jdmichal: The ''' is just a way to quote strings in Python.
ΤΖΩΤΖΙΟΥ
+1  A: 

You could do something like this:

>>> import re
>>> pattern = re.compile(r'\s*("[^"]*"|.*?)\s*,')
>>> def split(line):
...  return [x[1:-1] if x[:1] == x[-1:] == '"' else x
...          for x in pattern.findall(line.rstrip(',') + ',')]
... 
>>> split("foo, bar, baz")
['foo', 'bar', 'baz']
>>> split('foo, bar, baz, "blub blah"')
['foo', 'bar', 'baz', 'blub blah']
Armin Ronacher
A: 

If it doesn't need to be pretty, this might get you on your way:

def f(s, splitifeven):
    if splitifeven & 1:
        return [s]
    return [x.strip() for x in s.split(",") if x.strip() != '']

ss = 'foo, bar, "one, two", three four'

print sum([f(s, sie) for sie, s in enumerate(ss.split('"'))], [])
Rodrigo Queiro
+7  A: 

It depends how complicated you want to get... do you want to allow more than one type of quoting. How about escaped quotes?

Your syntax looks very much like the common CSV file format, which is supported by the Python standard library:

import csv
reader = csv.reader(['''foo, bar, "one, two", three four'''], skipinitialspace=True)
for r in reader:
  print r

Outputs:

['foo', 'bar', 'one, two', 'three four']

HTH!

Dan
Yeah, the csv module is totally what you want here.
Electrons_Ahoy
+5  A: 

The shlex module solution allows escaped quotes, one quote escape another, and all fancy stuff shell supports.

>>> import shlex
>>> my_splitter = shlex.shlex('''foo, bar, "one, two", three four''', posix=True)
>>> my_splitter.whitespace += ','
>>> my_splitter.whitespace_split = True
>>> print list(my_splitter)
['foo', 'bar', 'one, two', 'three', 'four']

escaped quotes example:

>>> my_splitter = shlex.shlex('''"test, a",'foo,bar",baz',bar \xc3\xa4 baz''',
                              posix=True) 
>>> my_splitter.whitespace = ',' ; my_splitter.whitespace_split = True 
>>> print list(my_splitter)
['test, a', 'foo,bar",baz', 'bar \xc3\xa4 baz']
nosklo
This splits up the three and four, which is not in the specification.
Rodrigo Queiro
Needs a fix for splitting the final "three four".
ΤΖΩΤΖΙΟΥ
Simply changing my_splitter.whitespace += ',' to = ',' will do it, but you still need to strip each element.
Jeremy Cantrell
A: 

I'd say a regular expression would be what you're looking for here, though I'm not terribly familiar with Python's Regex engine.

Assuming you use lazy matches, you can get a set of matches on a string which you can put into your array.

Jeff