views:

401

answers:

8

Suppose I have a string like this:

one two three "four five six" seven eight

and I want to convert it to this:

one,two,three,"four five six",seven,eight

What's the easiest way to do this in C#?

A: 

I would use the Regex class for this purpose.

Regular expressions can be used to match your input, break it down into individual groups, which you can then reassemble however you want. You can find documentation on the regex classes here.

Regex rx = new Regex( "(\w)|([\"]\w+[\"])" );
MatchCollection matches = rx.Matches("first second \"third fourth fifth\" sixth");
string.Join( ", ", matches.Select( x => x.Value ).ToArray() );
LBushkin
How can you use Regex to tackle this problem? In regex I don't think there would be a way to know if you're inside a quote or not...
Meta-Knight
Regex in .NET supports both look-ahead and pair exclusion matching.
LBushkin
What if you have more than two quotes? Would that still work?
Meta-Knight
If you mean that you can have escaped quotes, then you may have to enhance the regular expression to use look-ahead to skip quotes that are escaped. For example, q(?!u) will only match 'q' if it is followed by a 'u'. So, in the common case of using two quotes to act as an escape, you could use ["](?!["]) as a look-ahead exclusion rule. Depending on exactly how you want to escape quotes, you may need to use other techniques, such as a look-behind zero width assertion. Check out: 'http://www.regular-expressions.info/lookaround.html'
LBushkin
@Meta-Knight - You can with balanced grouping. Not that I'd recommend that approach...it's incredibly obtuse. http://www.codeproject.com/KB/recipes/RegEx_Balanced_Grouping.aspx
Mark Brackett
Regex can do some cool things, but it's crazy slow.
Robert Harvey
+9  A: 

Assuming that quotes are inescapable you can do the following.

public string SpaceToComma(string input) { 
  var builder = new System.Text.StringBuilder();
  var inQuotes = false;
  foreach ( var cur in input ) {
    switch ( cur ) { 
      case ' ':
         builder.Append(inQuotes ? cur : ',');
         break;
      case '"':
         inQuotes = !inQuotes;
         builder.Append(cur);
         break;
      default:
         builder.Append(cur);
         break;
    }
  }
  return builder.ToString();
}
JaredPar
+2  A: 
 static string Space2Comma(string s)
 {
    return string.Concat(s.Split('"').Select
        ((x, i) => i % 2 == 0 ? x.Replace(' ', ',') : '"' + x + '"').ToArray());
 }
Mehrdad Afshari
Doh... I was gonna try LINQ-ify it but you beat me to it :-)
chakrit
A: 

My first guess is to use a parser that's already written and simple change the delimiter and quote character fit your needs (which are and " respectively).

It looks like this is available to you in C#: http://msdn.microsoft.com/en-us/library/microsoft.visualbasic.fileio.textfieldparser.aspx

Perhaps if you changed the delimiter to " ", it may suit your needs to read in the file and then it's just a matter of calling String.Join() a for each line.

llamaoo7
A: 

Here's a more reusable function that I came up with:

private string ReplaceWithExceptions(string source, char charToReplace, 
    char replacementChar, char exceptionChar)
{
    bool ignoreReplacementChar = false;
    char[] sourceArray = source.ToCharArray();

    for (int i = 0; i < sourceArray.Length; i++)
    {
        if (sourceArray[i] == exceptionChar)
        {
            ignoreReplacementChar = !ignoreReplacementChar;
        }
        else
        {
            if (!ignoreReplacementChar)
            {
                if (sourceArray[i] == charToReplace)
                {
                    sourceArray[i] = replacementChar;
                }
            }
        }
    }

    return new string(sourceArray);
}

Usage:

string test = "one two three \"four five six\" seven eight";
System.Diagnostics.Debug.WriteLine(ReplaceWithExceptions(test, char.Parse(" "),
    char.Parse(","), char.Parse("\"")));
raven
A: 

This may be overkill, but if you believe the problem may generalize, such as having a need to split by other types of characters, or having additional rules that define a token, you should consider either using a parser generator such as Coco or writing a simple one on your own. Coco/R, for instance, will build generate a lexer and parser from an EBNF grammar you provide. The lexer will be a DFA, or a state machine, which is a generalized form of the code provided by JaredPar. Your grammar definition for Coco/R would look like this:

CHARACTERS
alphanum = 'A'..'Z' + 'a'..'z' + '0'..'9'.

TOKENS
unit   = '"' {alphanum|' '} '"' | {alphanum}.

Then the produced lexer will scan and tokanize your input accordingly.

eulerfx
A: 

Per my comment to the original question, if you don't need the quotes in the final result, this will get the job done. If you do need the quotes, feel free to ignore this.

private String SpaceToComma(string input)
{
    String[] temp = input.Split(new Char[] { '"' }, StringSplitOptions.RemoveEmptyEntries);
    for (Int32 i = 0; i < temp.Length; i += 2)
    {
        temp[i] = temp[i].Trim().Replace(' ', ',');
    }
    return String.Join(",", temp);
}
JeffK
A: 

@Mehrdad beat me to it but guess I'll post it anyway:

static string Convert(string input)
{
    var slices = input
        .Split('"')
        .Select((s, i) => i % 2 != 0
            ? @"""" + s + @""""
            : s.Trim().Replace(' ', ','));

    return string.Join(",", slices.ToArray());
}

LINQified and tested :-) ... For a full console app: http://pastebin.com/f23bac59b

chakrit