views:

43

answers:

1

I currently have this regular expression to split strings by all whitespace, unless it's in a quoted segment:

keywords = 'pop rock "hard rock"';
keywords = keywords.match(/\w+|"[^"]+"/g);
console.log(keywords); // [pop, rock, "hard rock"]

However, I also want it to be possible to have quotes in keywords, like this:

keywords = 'pop rock "hard rock" "\"dream\" pop"';

This should return

[pop, rock, "hard rock", "\"dream\" pop"]

What's the easiest way to achieve this?

+2  A: 

You can change your regex to:

keywords = keywords.match(/\w+|"(?:\\"|[^"])+"/g);

Instead of [^"]+ you've got (?:\\"|[^"])+ which should be self-explanatory - allow \" or other character, but not an unescaped quote.

One important note is that if you want the string to include a literal slash, it should be:

keywords = 'pop rock "hard rock" "\\"dream\\" pop"'; //note the escaped slashes.

Also, there's a slight inconsistency between \w+ and [^"]+ - for example, it will match the word "ab*d", but not ab*d (without quotes). Consider using [^"\s]+ instead, that will match non-spaces.

Kobi
I suggest you use `\\.` instead of `\\"` because backslashes can be escaped too, and you wouldn't want to miss `"foo\\\\"`.
Tim Pietzcker
@Tim - interesting idea at first, but I'm not sure it's necessary - wouldn't `[^"]` handle these cases? Am I missing something?
Kobi
Consider this: In the string `"\\" "foo"` (just two backlashes for clarity), the first `"` would be matched by the literal `"` at the start of the regex. Then the `[^"]` would match the first \. Then the remaining `\"` would be matched by `\\"` (because it comes first in the alternation). Then `[^"]` would match the space and the `"` (at the end of the regex) would match the opening quote of `"foo"`, disrupting the parsing.
Tim Pietzcker
It works just like it should. "(?:\\"|[^"])+ which should be self-explanatory" < not really ;-), I never used this in regexps before, a colleague had to explain it to me. "Consider using [^"\s]+ instead" < This is something I already adjusted. Thanks for your help!
Blaise Kal
@Tim - That would be the same for `"A\" "foo"` - the second quote is escaped, which fits the requirements here, but I get your point - it's a good idea not to allow lonely slashes, something like `\w+|"(?:\\.|[^"\\])+"`.
Kobi
@Blaise - no problem. It a small leap from `[^"]` to `\\"|[^"]`, but then you need a group, and you might as well make it a non-capturing group... I guess I took too many steps at once `:P`
Kobi
@Kobi: It would not be the same for `"A\" "foo"` - in `"\\" "foo"` the second quote is *not* escaped.
Tim Pietzcker
@Tim - I completely got your point. `:)`
Kobi