tags:

views:

81

answers:

4

Hello,

I am wanting to take a string of say the following:

Guiness Harp "Holy Moses"

So that in C# or VB get a match set of:

Guiness
Harp
Holy Moses

Essentially it splits on the spaces unless there are quotes around the spaces, then those words between quotes are considered a single phrase.

Thanks, Kevin

+4  A: 

If you don't have any (escaped or doubled) quotes inside your quoted strings, you could search for

 "[^"]*"|\S+

However, the quotes will be part of the match. The regex can be extended to also handle quotes inside quoted strings if necessary.

Another (and in this case preferable) possibility would be to use a csv parser.

For example (Python):

import csv
reader = csv.reader(open('test.txt'), delimiter=' ', quotechar='"')
for row in reader:
    print(row)
Tim Pietzcker
Yep, using an existing parser like this definitely makes more sense than trying to re-invent one. Oh, and congratulations on getting to 10k rep. :)
Peter Boughton
For this simple application I have I am not very worried about escaped characters. Not many users for the app and what I gain from the RegEx outweighs the negative in this case. I also found another - far more complex expression that also takes out the quotes. It may do other things but Regular Expressions are not high on my skillset.(?<=(?:^|\s|,)")[^"]*?(?=")|(?<=\s|^)(?!")[\w\W]+?(?=\s|$)Thanks for your help!Kevin
Grandizer
@Peter: Thanks! Looks like you're next :)
Tim Pietzcker
A: 

Regular expressions can't count, which makes delimiter parsing difficult.

I would use a parser rather than regular expressions for this.

Ben S
A: 

If this is a simple parsing you may be able to trim the starting and ending quotes.

string text = "Guiness Harp \"Holy Moses\"";
string pattern = @"""[^""]*""|\S+";

MatchCollection matches = Regex.Matches( text, pattern );
foreach( Match match in matches )
{
    string value = match.Value.Trim( '"' );
    Console.Out.WriteLine( value );
}

However, this implementation isn't very flexible. I'd only use something like this in an internal tool. Or you don't mind throwing away your code.

Jerod Houghtelling
+1  A: 

Here's another approach:

string s0 = @"Guiness Harp ""Holy Moses""";
Regex r = new Regex(@"""(?<FIELD>[^""]*)""|(?<FIELD>\S+)");
foreach (Match m in r.Matches(s0))
{
  Console.WriteLine(m.Groups["FIELD"].Value);
}

This takes advantage of the fact that .NET regexes let you reuse group names within the same regex. Very few regex flavors allow that, and of those only Perl 6 is as flexible about it as .NET.

Alan Moore