tags:

views:

80

answers:

3

Hey,

I got quite a lot of strings (segments of SQL code, actually) with the following format:

('ABCDEFG', 123542, 'XYZ 99,9')

and i need to split this string, using C#, in order to get:

  • 'ABCDEFG'
  • 123542
  • 'XYZ 99,9'

I was originally using a simple Split(','), but since that comma inside the last parameter is causing havoc in the output i need to use Regex to get it. The problem is that i'm still quite noobish in regular expressions and i can't seem to crack the pattern mainly because inside that string both numerical and alpha-numerical parameters may exist at any time...

What could i use to split that string according to every comma outside the quotes? Cheers

+4  A: 

You could split on all commas, that do have an even number of quotes following them , using the following Regex to find them:

",(?=(?:[^']*'[^']*')*[^']*$)"

You'd use it like

var result = Regex.Split(samplestring, ",(?=(?:[^']*'[^']*')*[^']*$)");
Jens
Almost works, but the comma may not have an even number of quotes following it; for example, if the last parameter is a number, it won't work.Eg.:('qwerqwrqw', 'ODJQWPODKWPOQDKPWQO 9,99', 2174);This returns: 'ODJQWPODKWPOQDKPWQO 9,99', 2174 as the last input.
Hal
@Hal: I edited this post a few times, since my original try had a bug. Please try the current version. Your example has an even number of quotes after each comma, if you count zero as even, which my expression does.
Jens
Sorry, but the "result" var still yields a string array with length 2.result[0] -> 'qwerqwrqw'result[1] -> 'ODJQWPODKWPOQDKPWQO 9,99', 2174
Hal
@Hal: Yeah, I should actually post the version I am working with here. =) Sorry and updated.
Jens
Works like a charm. Thanks a million Jens! =)
Hal
A: 

Try (hacked from Jens') in the split method:

",(?:.*?'[^']*?')"

or just add question marks after Jens' *'s, that makes it lazy rather than greedy.

FallingBullets
Still wrong:result[0] -> 'qwerqwrqw' result[1] -> , 2174
Hal
@Falling: You seem to be missing the point of Jens's regex. The part after the comma has to be a lookahead, and the lookahead has to account for all the remaining quotes. It has to be anchored with `$`, so non-greedy quantifiers are pointless, and it can't use `.` because that will make it lose count of the quotes.
Alan Moore
yeah, I realised that after.
FallingBullets
A: 

hi guys!

although I too like a challenge some of the time, but this actually isn't one. please read this article http://secretgeek.net/csv_trouble.asp and then go on and use http://www.filehelpers.com/

[Edit1, 3]: or maybe this article can help too (the link only shows some VB.Net sample code but still, you can use it with C# too!): http://msdn.microsoft.com/en-us/library/cakac7e6.aspx

I've tried to do the sample for C# (add reference to Microsoft.VisualBasic to your project)

using System;
using System.IO;
using Microsoft.VisualBasic.FileIO;

namespace ConsoleApplication1
{
    class Program
    {
        static void Main(string[] args)
        {
            TextReader reader = new StringReader("('ABCDEFG', 123542, 'XYZ 99,9')");
            TextFieldParser fieldParser = new TextFieldParser(reader);

            fieldParser.TextFieldType = Microsoft.VisualBasic.FileIO.FieldType.Delimited;
            fieldParser.SetDelimiters(",");

            String[] currentRow; 

            while (!fieldParser.EndOfData)
            {
                try
                {
                     currentRow = fieldParser.ReadFields();

                     foreach(String currentField in currentRow)
                     {
                        Console.WriteLine(currentField);                        
                     }
                }
                catch (MalformedLineException e)
                {
                    Console.WriteLine("Line {0} is not valid and will be skipped.", e);
               }

            } 

        }
    }
}

[Edit2]: found another one which could be of help here: http://www.codeproject.com/KB/database/CsvReader.aspx

-- reinhard

pastacool
This isn't for CSVs, although Filehelpers looks interesting. Thanks
Hal
although your sample string is not a CSV file you could still look at it as one row from a CSV. I just wanted to point out, as many others have to people trying to use RegEx for parsing HTML and RegEx is definitely not good for that, that also for parsing CVS like strings it's better to use a parser/helper/whatever instead of plain RegEx.
pastacool
lol this is a C# project and i was looking for one solution in that same language, sorry but i'm not going to use VB (ffs). Thanks again
Hal
@Hal: just because the sample code is VB doesn't mean you can't use it in C# (add a reference to Microsoft.VisualBasic and add using Microsoft.VisualBasic.FileIO; and you're fine to use TextFieldParser)
pastacool