tags:

views:

72

answers:

2

Hi,

I have this code to read a CVS file. It reads each line, devides each line by delimiter ',' and stored the field values in array 'strline()' .

How do I extract only required fields from the CSV file?

For example if I have a CSV File like

Type,Group,No,Sequence No,Row No,Date (newline) 0,Admin,3,345678,1,26052010 (newline) 1,Staff,5,78654,3,26052010

I Need only the value of columns Group,Sequence No and date.

Thanks in advance for any ideas.

Dim myStream As StreamReader = Nothing
    ' Hold the Parsed Data
    Dim strlines() As String
    Dim strline() As String
    Try
      myStream = File.OpenText(OpenFile.FileName)
      If (myStream IsNot Nothing) Then
        ' Hold the amount of lines already read in a 'counter-variable' 
        Dim placeholder As Integer = 0
        strlines = myStream.ReadToEnd().Split(Environment.NewLine)
        Do While strlines.Length <> -1 ' Is -1 when no data exists on the next line of the CSV file
          strline = strlines(placeholder).Split(",")
          placeholder += 1
        Loop
      End If
    Catch ex As Exception
      LogErrorException(ex)

    Finally

      If (myStream IsNot Nothing) Then
        myStream.Close()
      End If
    End Try
+4  A: 

1) DO NOT USE String.Split!!

CSV data can contain comma's, e.g.

id,name
1,foo
2,"hans, bar"

Also as above you would need to handle quoted fields etc... See CSV Info for more details.

2) Check out TextFieldParser - it hadles all this sort of thing.

It will handle the myriad of different escapes you can't do with string.split...

Sample from: http://msdn.microsoft.com/en-us/library/cakac7e6.aspx

Using MyReader As New Microsoft.VisualBasic.FileIO.TextFieldParser("C:\TestFolder\test.txt")
    MyReader.TextFieldType = FileIO.FieldType.Delimited
    MyReader.SetDelimiters(",")
    Dim currentRow As String()
    While Not MyReader.EndOfData
        Try
            currentRow = MyReader.ReadFields()
            Dim currentField As String
            For Each currentField In currentRow
            MsgBox(currentField)
            Next
        Catch ex As Microsoft.VisualBasic.FileIO.MalformedLineException
            MsgBox("Line " & ex.Message & "is not valid and will be skipped.")
        End Try
    End While
End Using

The MyReader.ReadFields() part will get you an array of strings, from there you'll need to use the index etc...

PK :-)

Paul Kohler
+1 string.split is not only wrong, it's also extra slow compared to a state machine
Joel Coehoorn
+1 for yelling an important warning at the OP.
M.A. Hanin
I'd agree that it's a good warning about String.Split and " etc in general, but it all depends on where the data comes from and what the spec is. Most of the times I've read CSV files and used String.Split, if I'd ended up with too many columns I'd failed the file and told the producer of the file to fix his data.
ho1
I totally agree with "failing fast". The thing I do encourage though is if you are creating/consuming files of a particular format (in this case CSV) you should do your best to implement the defined format. The string split method only covers the basics of CSV and using something like the TextFieldParser class is only a couple more lines of code and you have full CSV reader support... And again - if my input file was different I would reject it quick too!
Paul Kohler
A: 

Maybe instead of only importing selected fields, you should import everything, then only use the ones you need.

Matthew Jones