views:

322

answers:

5

I'm trying to import a tab delimited file into a table.

The issue is, SOMETIMES, the file will include an awkward record that has two "null values" and causes my program to throw a "unexpected end of file".

For example, each record will have 20 fields. But the last record will have only two fields (two null values), and hence, unexpected EOF.

Currently I'm using a StreamReader.

I've tried counting the lines and telling bcp to stop reading before the "phantom nulls", but StreamReader gets an incorrect count of lines due to the "phantom nulls".

I've tried the following code to get rid of all bogus code (code borrowed off the net). But it just replaces the fields with empty spaces (I'd like the result of no line left behind).

Public Sub RemoveBlankRowsFromCVSFile2(ByVal filepath As String)
    If filepath = DBNull.Value.ToString() Or filepath.Length = 0 Then Throw New ArgumentNullException("filepath")

    If (File.Exists(filepath) = False) Then Throw New FileNotFoundException("Could not find CSV file.", filepath)


    Dim tempFile As String = Path.GetTempFileName()

    Using reader As New StreamReader(filepath)
        Using writer As New StreamWriter(tempFile)
            Dim line As String = Nothing
            line = reader.ReadLine()
            While Not line Is Nothing

                If Not line.Equals(" ") Then writer.WriteLine(line)

                line = reader.ReadLine()
            End While
        End Using
    End Using


    File.Delete(filepath)
    File.Move(tempFile, filepath)
End Sub

I've tried using SSIS, but it encounters the EOF unexpected error.

What am I doing wrong?

A: 

If you read the entire file into a string variable (using reader.ReadToEnd()) do you get the whole thing? or are you just getting the data up to those phantom nulls?

Have you tried using the Reader.ReadBlock() function to try and read past the file length?

Jrud
A: 

At our company we do hundreds of imports every week. If a file is not sent in the correct, agreed to format for our automated process, we return it to the sender. If the last line is wrong, the file should not be processed because it might be missing information or in some other way corrupt.

HLGEM
A: 

One way to avoid the error is to use ReadAllLines, then process the array of file lines instead of progressing through the file. This is also a lot more efficient than streamreader.

Dim fileLines() As String
fileLines = File.ReadAllLines("c:\tmp.csv")
...
for each line in filelines
  If trim(line) <> "" Then writer.WriteLine(line)
next line

You can also use save the output lines in the same or a different string array and use File.WriteAllLines to write the file all at once.

xpda
A: 

You could try the built-in .Net object for reading tab-delimited files. It is Microsoft.VisualBasic.FileIO.TextFileParser.

MarkJ
A: 

This was solved using a bit array, checking one bit at a time for the suspect bit.

Idealflip