tags:

views:

84

answers:

4

I am trying to import a file in from my clients custom software. The file is basically a csv file with some custom escape characters in it. I am reading in the file line by line, then splitting each line into a string[]. I am then assigning each each element to a field in my custom object. For example:

Person.Name = line[0];
Person.Age = line[1];
Person.Height = line[2];

etc. The problem is, that some of the files I am importing are from an older version of the app and do not contain all the fields. So this line

Person.Height = line[2];

errors out because line.Length = 2 instead of 3.

Is there a "clean" way of solving this issue? I've gotten around it now by writing an if statement before each assignment to make sure the line[x] is valid, but that just seems kludgy to me.

+1  A: 

Ok, this is a little off the wall but it could work.

You could load your "line" array up into a stack of strings, and pop each item off the stack as you assign them to the fields. This assumes, of course, that if any items are "missing" then they're missing from the end of the line.

So here's the idea:

var fields = new Stack<string>(line);
Person.Name = fields.PopOrDefault();
Person.Age = fields.PopOrDefault();
Person.Height = fields.PopOrDefault();

I'm using a "PopOrDefault" extension method because obviously the "Pop" method on Stack<T> will throw an exception if there are no more items. Here's the implementation for that (it's pretty straight-forward):

static class StackExtensions
{
    public static T PopOrDefault<T>(this Stack<T> stack)
    {
        if (stack.Count == 0) return default(T);
        return stack.Pop();
    }
}

So if any fields are missing, the property will get the default value for that type (in this case the default for string, which is null). You could even add a second parameter to "PopOrDefault" so you could specify your own default value.

Matt Hamilton
+2  A: 

I've got a couple suggestions, assuming you aren't parsing huge CSVs where every drop of performance counts.

One approach I use is to have a step before you actually start working with the data where you ensure that the file is consistently formatted. In your case, that would mean scanning each row to determine if all the columns are present and if not, inserting default values for the missing data.

This can help keep your "cleanup" code separate from your data processing. This actually takes a little more coding and would be slower from a performance perspective (since you're essentially parsing the file twice), but could help make your code a little easier to read and debug since you are breaking it into two separate activities.

An alternative would be to use a third-party library like LINQtoCSV to take care of marking "nullable" columns for you. Then you can refer to columns as named properties instead of by index.

Smashd
A: 

How many versions of the csv file are there? This seems like a simple answer, but could you do something like:

int OLD_VERSION_NUM_COLUMNS = 2;
bool isOlderVersion = string.length == OLD_VERSION_NUM_COLUMNS;

Person.Name = isOlderVersion ? line[0] : line[0];
Person.Age = isOlderVersion ? line[1] : line[1];
Person.Height = isOlderVersion ? 0 : line[2];
Person.Width = isOlderVersion ? line[2] : line[3];

This isn't very efficient and if you have 40 columns will be hard to read but the concept would definitely work.

Matthew Doyle
+1  A: 

You might also take a look at the open source library http://filehelpers.sourceforge.net/ for reading CSV files (and other file types). It should handle missing fields and you can specify optional fields.

Kenan