tags:

views:

2944

answers:

5

I'm trying to write a regex to replace all spaces that are not included in quotes so something like this:

a = 4, b = 2, c = "space here"

would return this:

a=4,b=2,c="space here"

I spent some time searching this site and I found a similar q/a ( http://stackoverflow.com/questions/79968/split-a-string-by-spaces-in-python#80449 ) that would replace all the spaces inside quotes with a token that could be re-substituted in after wiping all the other spaces...but I was hoping there was a cleaner way of doing it.

+1  A: 

I consider this very clean:

mystring.scan(/((".*?")|([^ ]))/).map { |x| x[0] }.join

I doubt gsub could do any better (assuming you want a pure regex approach).

Romulo A. Ceccon
+4  A: 

This seems to work:

result = string.gsub(/( |(".*?"))/, "\\2")
Borgar
if you get into single- and double-quoted strings, you need to match opening and closing quote marks
Gene T
A: 

try this one, string in single/double quoter is also matched (so you need to filter them, if you only need space):

/( |("([^"\\]|\\.)*")|('([^'\\]|\\.)*'))/
Senmiao Liu
+5  A: 

It's worth noting that any regular expression solution will fail in cases like the following:

a = 4, b = 2, c = "space" here"

While it is true that you could construct a regexp to handle the three-quote case specifically, you cannot solve the problem in the general sense. This is a mathematically provable limitation of simple DFAs, of which regexps are a direct representation. To perform any serious brace/quote matching, you will need the more powerful pushdown automaton, usually in the form of a text parser library (ANTLR, Bison, Parsec).

With that said, it sounds like regular expressions should be sufficient for your needs. Just be aware of the limitations.

Daniel Spiewak
What is the 'correct' solution for this case?
rjmunro
A: 

Daniel,

The space between double-quote and 'here' is NOT in quotes in your example.

Senmiao Liu