tags:

views:

114

answers:

5

I have the following comma-separated string that I need to split. The problem is that some of the content is within quotes and contains commas that shouldnt be used in the split...

String:
111,222,"33,44,55",666,"77,88","99"

I want the output:
111
222
33,44,55
666
77,88
99

I have tried this:
(?:,?)((?<=")[^"]+(?=")|[^",]+)
But it reads the comma between "77,88","99" as a hit and I get the following output:
111
222
33,44,55
666
77,88
,
99

Can anybody help me? Im running out of hours... :) /Peter

+6  A: 

Don't reinvent a CSV parser, try FileHelpers.

Darin Dimitrov
+1  A: 

I have used this csv reader--it is fast!

ShellShock
+3  A: 

Depending on your needs you may not be able to use a csv parser, and may in fact want to re-invent the wheel!!

You can do so with some simple regex

(?:^|,)(\"(?:[^\"]+|\"\")*\"|[^,]*)

This will do the following:

(?:^|,) = Match expression "Beginning of line or string ","

(\"(?:[^\"]+|\"\")\"|[^,]) = A numbered capture group, this will select between 2 alternatives:

  1. stuff in quotes
  2. stuff between commas

This should give you the output you are looking for.

Example code in C#

public void SplitCSV(string input)
{
    Regex csvSplit = new Regex("(?:^|,)(\"(?:[^\"]+|\"\")*\"|[^,]*)", RegexOptions.Compiled);

    foreach (Match match in csvSplit.Matches(input))
    {
        Console.WriteLine(match.Value.TrimStart(','));
    }
}

private void button1_Click(object sender, RoutedEventArgs e)
{
    SplitCSV("111,222,\"33,44,55\",666,\"77,88\",\"99\"");
}
jimplode
How about some sample code? As it is, this answer makes no sense.
Alan Moore
Without a code example it made perfect sense, and no example was given as I have no idea for what language he is writing. I have now included a sample in C#
jimplode
Thanks! This is helping me moving on. I did mention the coding language in the tags, but I will write it more clearly in the text next time.
Peter Norlén
Can you mark this as the correct answer? Hopefully it will help other people need to parse their own csv as they do not have the ease of using a csv parser that already exists.
jimplode
I used your solution with some trimming of the matches for commas and quotes, and that produces the result I need. Thanks a lot! I was close to the solution but you gave me the last push :o)Ok, uhm... where do I mark it as the correct answer? A little new to this site...
Peter Norlén
Aha, think I found it...
Peter Norlén
Glad I could help!! :) I like Regex!!
jimplode
+1  A: 

Try this:

       string s = @"111,222,""33,44,55"",666,""77,88"",""99""";

       List<string> result = new List<string>();

       var splitted = s.Split('"').ToList<string>();
       splitted.RemoveAll(x => x == ",");
       foreach (var it in splitted)
       {
           if (it.StartsWith(",") || it.EndsWith(","))
           {
               var tmp = it.TrimEnd(',').TrimStart(',');
               result.AddRange(tmp.Split(','));
           }
           else
           {
               if(!string.IsNullOrEmpty(it)) result.Add(it);
           }
      }
       //Results:

       foreach (var it in result)
       {
           Console.WriteLine(it);
       }
Andrzej Nosal
A: 

I once had to do something similar and in the end I got stuck with Regular Expressions. The inability for Regex to have state makes it pretty tricky - I just ended up writing a simple little parser.

If you're doing CSV parsing you should just stick to using a CSV parser - don't reinvent the wheel.

Jaco Pretorius