tags:

views:

2685

answers:

8

Hi,

I am working on a small project to take a CSV file and then insert its data into a HTML table (I would use datagrid and dataset or datatable, but the system I will be talking to does not support ASP.NET uploads for sending newsletters).

Anyway, I will use the file.readalllines method to return the contents of the csv file into a string array.

But for each string member of the array, I will be using the string.split function to split up the string into the char array. Problem is (and the csv file is generated by the system I talk to btw - I get data from this system and feed data into it), the csv contents are makes of cars. This means that I could have:

Nissan Almera

Nissan Almera 1.4 TDi

VW Golf 1.9 SE

And so forth...

Is there a robust way I could ensure that where I have Almera 1.4 TDi, for example, it is one member in the char array I split each string into, rather than seperate members.

A: 

You will need to use a regular expression. There are several good tools for manipulating them: Expresso, Regulator to name just a few. I think your only other option is if the text fields are fixed width.

Mitch Wheat
A: 

The Split() method takes a char parameter which can be used to specify the delimiter. So you can do something like:

String.Split(Convert.ToChar(","));

Judging by your question all the car makes should be delimited by commas so this should work.

Gary Willoughby
Uhh … `Convert.ToChar(",")`? Why not plain old `','`?
Konrad Rudolph
Readability. So there's no misunderstanding that the Split method needs a char. Some people don't realize single quotes gives you a char.
Gary Willoughby
Those people should probably find another profession. You're introducing extra instructions for no reason.
Robert C. Barth
It would be interesting to see if it compiles to the same IL.
Gary Willoughby
A: 

I'm a bit daff when it comes to cars, but could you not specify the major brand as the delimiter, as opposed to spaces?

EG: Nissan Almera Nissan _X100_Ultra_Model Ford Prefect Toyota Foo Bar Honda Prius

Parsing on Major brands (Nissan, Ford, Toyota, Honda) would produce:

  • Nissan Almera
  • Nissan _X100_Ultra_Model
  • Ford Prefect
  • Toyota Foo Bar
  • Honda Prius
Gavin Miller
+3  A: 

Use the overloaded version of string.Split() that limits the number of returned values.

    string makeModel = csvArray[0]; // or whichever column it is in
    string[] makeAndModel = makeModel.Split( new char[] { ' ' } , 2 );
    string make = makeAndModel[0];
    string model = makeAndModel[1];
tvanfosson
I like this method also .. with the _MASSIVE_ assumption that the first word is a Make, and the rest is a Model. Otherwise, get the data cleaned up before hand, delimiting it by the proper stuff (eg. column 1 == make, column 2 = model, etc). gl!
Pure.Krome
A: 

I like the last two approaches!

However, one other problem is that the system is putting out several files, and I have to work with .txt files (so like csv, but no commas). This is the reason for this question.

I have told my colleagues about the problems this causes, and they will push to ensure all data produced by the system is .csv.

With that last approach, I can parse on car manufacturer. I would probably look for the car manufacturer in the collection and see if it matches with the manufacturer on the csv. BTW, I could get the .csv files as seperate files for each brand but I'd like to read from one big list.

dotnetdev
"Like CSV, but no commas"...I suppose that's just a "separated value" file then? I think the commas are a rather important bit of a CSV file myself....
Mark Brackett
A: 

You will need to use a regular expression.

I'm not so sure you need a regex, but you could solve the problem with one, and then you'd have 2 problems.

A 5 second Google search of regex csv yields a blog entry

,(?=([^"]*"[^"]*")*(?![^"]*"))

While at first it looks to do the trick, this regex, while not matching comma's inside strings, matches the position of the comma. So you'd think it would be pretty trivial to turn that into something useful, or at least give you a starting point.

Mind you it fails miserably if you have an input string like

123,456,"Unbalanced quote

Where it doesn't match any comma's.


Step 2, Another Google Search, this time for c# split csv files

CSV FILE PARSER AND WRITER IN C# (PART 3) (but check out parts 1 & 2 for the code)

It looks a lot more robust, and even has test cases.

Because there is no standard CSV format, you'll have to be the judge if this works or not for the input files that you allow.

Robert Paulson
A: 

As I understand the issue:

  • The lines in the file being parsed are NOT CSV, they are space-delimited.
  • The value of the first field of each line (make/model) may contain 0 or more actual spaces.
  • The values of the other fields in each line contain no spaces, so a space delimiter works fine for them.

Let's say you have four columns, and the first column value is supposed to be "Nissan Almera 1.4 TDi". Using a normal Split() would result in 7 fields rather than 4.

(Untested code)

First, just split it:

int numFields = 4;
string[] myFields = myLine.Split(' ');

Then, fix the array:

int extraSpaces = myFields.length-numFields;
if(extraSpaces>0) {
  // Piece together element 0 in the array by adding the extra elements
  for(int n = 1; n <= extraSpaces; n++) {
    myFields[0] += ' ' + myFields[n];
  }
  // Move the other values back to elements 1, 2, and 3 of the array
  for(int n = 1; n < numFields; n++) {
    myFields[n] = myFields[n + extraSpaces];
    }
  }

Finally, ignore every element of the array beyond the four you actually wanted to parse.

Another approach would be regular expressions. I think something like this would work:

 MatchCollection m = RegEx.Matches(myLine, "^(.*) ([^ ]+) ([^ ]+) ([^ ]+)$");
 string MakeModel = m.Groups[1].Captures[0].ToString();
 string ModelYear = m.Groups[2].Captures[0].ToString();     
 string Price     = m.Groups[3].Captures[0].ToString();     
 string NumWheels = m.Groups[4].Captures[0].ToString();

No splitting or arrays here, just RegEx captured groups.

If there were a built-in String.Reverse() method (there's not), I might suggest using VB.NET's Replace() function with the Count parameter to replace all spaces after the first three spaces (assuming four fields) in the reversed raw string, then reversing it again and splitting it. Something like:

string[] myFields = Microsoft.VisualBasic.Replace(myLine.Reverse(), " ", "_", 0, 3).Reverse().Split(' ');
myFields[0] = myFields[0].Replace("_", " "); //fix the underscores
richardtallent
A: 

As somebody else pointed out, string.split() takes a parameter, so you can pass a ',' to split based on that. It would not matter if you have spaces in values. Unless you are really sure that you will have no values containing commas, though, I don't sugggest doing that. Paarsing CSV files is a bit trickier than it might seem initially (handling quotes and values containing commas) and I suggest using some exising library for that like http://www.codeproject.com/KB/database/CsvReader.aspx.

Denis Fradlin