views:

1948

answers:

2

I am using the following method for reading Csv file content:

    /// <summary>
    /// Reads data from a CSV file to a datatable
    /// </summary>
    /// <param name="filePath">Path to the CSV file</param>
    /// <returns>Datatable filled with data read from the CSV file</returns>
    public DataTable ReadCsv(string filePath)
    {
        if (string.IsNullOrEmpty(filePath))
        {
            log.Error("Invalid CSV file name.");
            return null;
        }

        try
        {
            DataTable dt = new DataTable();

            string folder = FileMngr.Instance.ExtractFileDir(filePath);
            string fileName = FileMngr.Instance.ExtractFileName(filePath);
            string connectionString = 
            string.Concat(@"Driver={Microsoft Text Driver (*.txt; *.csv)};Dbq=",
            folder, ";");

            using (OdbcConnection conn = 
                   new System.Data.Odbc.OdbcConnection(connectionString))
            {
                string selectCommand = string.Concat("select * from [", fileName, "]");
                using (OdbcDataAdapter da = new OdbcDataAdapter(selectCommand, conn))
                {
                    da.Fill(dt);
                }
            }

            return dt;
        }
        catch (Exception ex)
        {
            log.Error("Error loading CSV content", ex);
            return null;
        }
    }

This method works if I have a UTF-8 encoded Csv file with a schema.ini that looks something like this:

[Example.csv]
Format=Delimited(,)
ColNameHeader=True
MaxScanRows=2
CharacterSet=ANSI

If I have German characters in a Csv file with Unicode encoding, the method cannot read the data correctly.

What modifications can I make to the above method to read Unicode Csv files? If there is no way to do it this way, what Csv-reading code can you suggest?

+3  A: 

Well, a very good and well-used streaming CSV reader is on CodeProject; that is the first thing I'd try... but it sounds like your encoding may be borked, which might not make it simple... of course, it could just be odbc that is breaking, in which case the above might work fine.

For simple CSV you could try parsing it yourself (string.Split etc), but there are enough edge-cases that a pre-rolled parser is worth using.

Marc Gravell
+1 for CsvReader :)
CMS
+7  A: 

Try using CharacterSet=UNICODE in your schema.ini file. Although this is not documented on MSDN it works according to this thread on Microsoft Forums.

csgero
Gotta love undocumented functionality...
Marc Gravell
This could be what I was looking for! I am trying it now
Germstorm
This thing worked! This is some very useful piece of information right here! Thank you!
Germstorm
I used this in my connection string as opposed to the schema.ini file, but it worked just the same.
Aaron Smith
@Aaron Smith, can u send me example of your connection string how do u use it without schema.ini please?
msony
@msony Provider=Microsoft.Jet.OLEDB.4.0;Data Source=C:\;Extended Properties=""text;HDR=Yes;CharacterSet=UNICODE;FMT=Delimited"
Aaron Smith