views:

79

answers:

5

I'm about to build a solution to where I receive a comma separated list every night. It's a list with around 14000 rows, and I need to go through the list and select some of the values in the list. The document I receive is built up with around 50 semicolon separated values for every "case". How the document is structured:

"";"2010-10-17";"";"";"";Period-Last24h";"Problem is that the customer cant find...."; and so on, with 43 more semicolon statements. And every "case" ends with the value "Total 515";

What I need to do is go through all these "cases" and withdraw some of the values in the "cases". The "cases" is always built up in the same order and I know that it's always the 3, 15 and 45'th semicolon value that I need to withdraw.

How can I do this in the easiest way?

+2  A: 

I think you should decompose this problem into smaller problems. Here are the steps I'd take:

  1. Each semi-colon separated record represents a single object. C# is an object-oriented language. Stop thinking in terms of .csv records and start thinking in terms of objects. Break up the input into semi-colon delimited records.
  2. Given a single comma-separated record, the values represent the properties of your object. Give them meaningful names.
  3. Parse a comma-separated record into an object. When you're done, you'll have a collection of objects that you can deal with.
  4. Use C#'s collections and LINQ to filter your list based on those cases that you need to withdraw. When you're done, you'll have a collection of objects with the desired cases removed.

Don't worry about the "easiest" way. You need one way that works. Whatever you do, get something working and worry about optimizing it to make it easiest, fastest, smallest, etc. later on.

duffymo
A: 

I'd search for an existing csv library. The escaping rules are probably not that easily mapped to regex.

If writing a library myself I'd first parse each line into a list/an array of strings. And then in a second step(probably outside of the csv library itself) convert the stringlist to a strongly typed object.

CodeInChaos
+1  A: 

Assuming the "rows" are lines and that you read line by line, your main tool should be string.Split:

foreach (string line in ... )
{
   string [] parts = line.split (';');
   string part3 = parts[2];
   string part15 = parts[14];
   // etc
} 

Note that this is a simple approach that will fail if the content of any column can contain ';'

Henk Holterman
I second the line by line/string.split() approach. Note that this approach won't work if there are no CR or LF delimiters between lines - the OP didn't specify.
NightOwl888
+1  A: 

You could use String.Split twice.

The first time using "Total 515"; as the split string using this overload. This will give you an array of cases.

The second time using ";" as the split character using this overload on each of the cases. This will give you a data array for each case. As the data is consistent you can extract the 3rd, 15th and 45th elements of this array.

ChrisF
A: 

A simple but slow approach would be reading single characters from the input (StringReader class, for example). Write a ReadItem method that reads a quote, continues to read until the next quote, and then looks for the next character. If it is a newline of semicolon, one item has been read. If it is another quote, add a single quote to the item being read. Otherwise, throw an exception. Then use this method to split up the input data into a series of items, each line stored e.g. in a string[number of items in a row], lines stored in a List<>. Then you can use this class to read the CSV data inside another class that decodes the data read into objects that you can get your data out of.

lbruder