tags:

views:

34

answers:

2

Hi all,

I've been trying to do this Regex for a while now. I'd like to create one that matches all the spaces of a text, except those in literal string.

Exemple:

123 Foo "String with spaces"

Space between 123 and Foo would match, as well as the one between Foo and "String with spaces", but only those two.

Thanks

+1  A: 

A common, simple strategy for this is to count the number of quotes leading up to your location in the string. If the count is odd, you are inside a quoted string; if the amount is even, you are outside a quoted string. I can't think of a way to do this in regular expressions, but you could use this strategy to filter the results.

wuputah
I think you're right. Regex can't save me this time, as it can't count a caracter occurence. Thanks
Frank
+1  A: 

You could use re.findall to match either a string or a space and then afterwards inspect the matches:

import re
hits = re.findall("\"(?:\\\\.|[^\\\"])*\"|[ ]", 'foo bar baz "another\\" test\" and done')
for h in hits:
    print "found: [%s]" % h

yields:

found: [ ]
found: [ ]
found: [ ]
found: ["another\" test"]
found: [ ]
found: [ ]

A short explanation:

"          # match a double quote
(?:        # start non-capture group 1
  \\\\.    #   match a backslash followed by any character (except line breaks)
  |        #   OR
  [^\\\"]  #   match any character except a '\' and '"'
)*         # end non-capture group 1 and repeat it zero or more times
"          # match a double quote
|          # OR
[ ]        # match a single space
Bart Kiers