Suppose I have a string like this:
one two three "four five six" seven eight
and I want to convert it to this:
one,two,three,"four five six",seven,eight
What's the easiest way to do this in C#?
Suppose I have a string like this:
one two three "four five six" seven eight
and I want to convert it to this:
one,two,three,"four five six",seven,eight
What's the easiest way to do this in C#?
I would use the Regex class for this purpose.
Regular expressions can be used to match your input, break it down into individual groups, which you can then reassemble however you want. You can find documentation on the regex classes here.
Regex rx = new Regex( "(\w)|([\"]\w+[\"])" );
MatchCollection matches = rx.Matches("first second \"third fourth fifth\" sixth");
string.Join( ", ", matches.Select( x => x.Value ).ToArray() );
Assuming that quotes are inescapable you can do the following.
public string SpaceToComma(string input) {
var builder = new System.Text.StringBuilder();
var inQuotes = false;
foreach ( var cur in input ) {
switch ( cur ) {
case ' ':
builder.Append(inQuotes ? cur : ',');
break;
case '"':
inQuotes = !inQuotes;
builder.Append(cur);
break;
default:
builder.Append(cur);
break;
}
}
return builder.ToString();
}
static string Space2Comma(string s)
{
return string.Concat(s.Split('"').Select
((x, i) => i % 2 == 0 ? x.Replace(' ', ',') : '"' + x + '"').ToArray());
}
My first guess is to use a parser that's already written and simple change the delimiter and quote character fit your needs (which are and " respectively).
It looks like this is available to you in C#: http://msdn.microsoft.com/en-us/library/microsoft.visualbasic.fileio.textfieldparser.aspx
Perhaps if you changed the delimiter to " ", it may suit your needs to read in the file and then it's just a matter of calling String.Join() a for each line.
Here's a more reusable function that I came up with:
private string ReplaceWithExceptions(string source, char charToReplace,
char replacementChar, char exceptionChar)
{
bool ignoreReplacementChar = false;
char[] sourceArray = source.ToCharArray();
for (int i = 0; i < sourceArray.Length; i++)
{
if (sourceArray[i] == exceptionChar)
{
ignoreReplacementChar = !ignoreReplacementChar;
}
else
{
if (!ignoreReplacementChar)
{
if (sourceArray[i] == charToReplace)
{
sourceArray[i] = replacementChar;
}
}
}
}
return new string(sourceArray);
}
Usage:
string test = "one two three \"four five six\" seven eight";
System.Diagnostics.Debug.WriteLine(ReplaceWithExceptions(test, char.Parse(" "),
char.Parse(","), char.Parse("\"")));
This may be overkill, but if you believe the problem may generalize, such as having a need to split by other types of characters, or having additional rules that define a token, you should consider either using a parser generator such as Coco or writing a simple one on your own. Coco/R, for instance, will build generate a lexer and parser from an EBNF grammar you provide. The lexer will be a DFA, or a state machine, which is a generalized form of the code provided by JaredPar. Your grammar definition for Coco/R would look like this:
CHARACTERS
alphanum = 'A'..'Z' + 'a'..'z' + '0'..'9'.
TOKENS
unit = '"' {alphanum|' '} '"' | {alphanum}.
Then the produced lexer will scan and tokanize your input accordingly.
Per my comment to the original question, if you don't need the quotes in the final result, this will get the job done. If you do need the quotes, feel free to ignore this.
private String SpaceToComma(string input)
{
String[] temp = input.Split(new Char[] { '"' }, StringSplitOptions.RemoveEmptyEntries);
for (Int32 i = 0; i < temp.Length; i += 2)
{
temp[i] = temp[i].Trim().Replace(' ', ',');
}
return String.Join(",", temp);
}
@Mehrdad beat me to it but guess I'll post it anyway:
static string Convert(string input)
{
var slices = input
.Split('"')
.Select((s, i) => i % 2 != 0
? @"""" + s + @""""
: s.Trim().Replace(' ', ','));
return string.Join(",", slices.ToArray());
}
LINQified and tested :-) ... For a full console app: http://pastebin.com/f23bac59b