What should happen to "a" b "c"
?
Note that in the substring " b "
the spaces are between quotes.
-- edit --
I assume a space is "between quotes" if it is preceded and followed by an odd number of standard quotation marks (i.e. U+0022, I'll ignore those funny Unicode “quotes”).
That means you need the following regex: ^[^"]*("[^"]*"[^"]*)*"[^"]* [^"]*"[^"]*("[^"]*"[^"]*)*$
("[^"]*"[^"]*)
represents a pair of quotes. ("[^"]*"[^"]*)*
is an even amount of quotes, ("[^"]"[^"]*)*"
an odd amount. Then there's the actual quoted string part, followed by another odd number of quotes. ^$
anchors are needed because you need to count every quote from the beginning of the string. This answers the " b "
substring problem above by never looking at substrings. The price is that every character in your input must be matched against the entire string, which turns this into an O(N*N) split operation.
The reason why you can do this in a regex is because there is a finite amount of memory needed. Effectively just one bit; "have I seen an odd or even number of quotes so far?". You don't actually have to match up individual ""
pairs.
This is not the only interpretation possible, though. If you do include “funny Unicode quotes”
which should be paired, you also need to deal with ““double quoted””
strings. This in turn means you need a count of open “
, which means you need infinite storage, which in turns means it's no longer a regular language, which means you can't use a regex. QED.
Anyway, even if it was possible, you still would want a proper parser. The O(N*N) behavior to count the number of quotes preceding each character just isn't funny. If you already know there are X quotes preceding Str[N], it should be an O(1) operation to determine how many quotes precede Str[N+1], not O(N). The possible answers are after all just X or X+1 !