tags:

views:

79

answers:

3

Hi all,

I'm horrible with regex but i'm trying to figure out how an import function works and i came across this regex pattern. Maybe one of you can help me understand how it works.

string pattern = @"^""(?<code>.*)"",""(?<last_name>.*)"",""(?<first_name>.*)"",""(?<address>.*)"",""(?<city>.*)"",""(?<state>.*)"",""(?<zip>.*)""$";
Regex re = new Regex(pattern);
Match ma = re.Match(_sReader.ReadLine().Trim());

Thanks

+2  A: 

It looks like it's trying to split a comma delimited string (with the fields having quotes around them) into separate fields with named groups. The (?<name>...) syntax captures the fields into named groups. The ^ indicates the match has to begin at the start of the string and the $ is the end of string anchor. The .* in each group says to capture everything (any character, zero or more times) that occur between the double quotes.

Basically, it should parse the input CSV string into an array of strings that you can refer to by group name. You can reference the captured groups using ma.Groups[x] where x is an integer or you can use the group name. For example, ma.Groups["code"].

TLiebe
So should the file being imported look something like this"code","lastname","firstname","address","city","state",zipor like this "code","lastname","firstname","address","city","state","zip"
zSysop
"code","lastname","firstname","address","city","state","zip"
LnDCobra
the zip needs to be in quotes, each "" in the string is a " in the original file. the (?<first_name> part of each set is noise, thats regex speak for take the next thing and call it first_name until you hit the )
DevelopingChris
As LnDCobra and DevelopingChris said, all fields including the zip field should be in quotes.
TLiebe
+1  A: 

The way I read it. Its a flat file record parser.
In this case its a csv with quotes.

And it makes you a dictionary of the fields. So that you can work with the csv record easier.

Instead of having to know that the 4th field is address in the code after this, you simply reference, groups["address"] and get the 4th field.

There are more straight forward and generic ways to do this. This regex is going to be very fragile over time, if hte file is poorly delimited or if a quote is missing at the end of hte file.

DevelopingChris
A: 

Divide and Conquer! works best with regex.

    string pattern = @"^""(?<last_name>.*)"",""(?<first_name>.*)""";

    Regex re = new Regex(pattern);

    //  INPUT: make sure you input it with " double inverted commas
    //  "("bond","james")"
    Match mm = re.Match(Console.ReadLine().Trim()); 

    Console.WriteLine("Last Name:"+mm.Groups["last_name"]);
    Console.WriteLine("First Name:"+mm.Groups["first_name"]);

OUTPUT:

 Last Name:("bond
 First Name:james")
TheMachineCharmer
divide and conquer is a great strategy, until you try and run the end of this one and get the wrong thing, although this is much simpler.
DevelopingChris