I am trying to parse a list of items which satisfies the python regex
r'\A(("[\w\s]+"|\w+)\s+)*\Z'
that is, it's a space separated list except that spaces are allowed inside quoted strings. I would like to get a list of items in the list (that is of items matched by the
r'("[\w\s]+"|\w+)'
part. So, for example
>>> parse('foo "bar baz" "bob" ')
['foo', '"bar baz"', '"bob"']
Is there any nice way to do this with python re?
Many things don't quite work. For example
>>> re.match(r'\A(("[\w\s]+"|\w+)\s+)*\Z', 'foo "bar baz" "bob" ').group(2)
'"bob"'
only returns the last one it matched. On the other hand
>>> re.findall(r'("[\w\s]+"|\w+)', 'foo "bar baz" "bob" ')
['foo', '"bar baz"', '"bob"']
but it also accepts malformed expressions like
>>> re.findall(r'("[\w\s]+"|\w+)', 'foo "bar b-&&az" "bob" ')
['foo', 'bar', 'b', 'az', '" "', 'bob']
So is there any way to use the original regex and get all of the items that matched group 2? Something like
>>> re.match_multigroup(r'\A(("[\w\s]+"|\w+)\s+)*\Z', 'foo "bar baz" "bob" ').group(2)
['foo', '"bar baz"', '"bob"']
>>> re.match_multigroup(r'("[\w\s]+"|\w+)', 'foo "bar b-&&az" "bob" ')
None
Edit: It is important that I preserve the quotes in the output, thus I don't want
>>> re.match_multigroup(r'\A(("[\w\s]+"|\w+)\s+)*\Z', 'foo "bar baz" "bob" ').group(2)
['foo', 'bar baz', 'bob']
because then I don't know if bob was quoted or not.