tags:

views:

213

answers:

2

In C#, I am reading insert SQL statements from a text file and then trying to execute them on a database using ADO.NET. Several of the queries include a greek character in one of the columns. Specifically, Mu (funky looking u) that is used for microseconds (usec). The problem is that question marks are actually being inserted into the database (?sec). But, I have a feeling the problem is in the reading of the text file because the debugger is also showing a question mark inside of a diamond. I am using the ReadLine method of a StreamReader. What am I doing wrong?

+2  A: 

The problem is almost certainly that you're using the wrong encoding when you're reading the file. Do you know what encoding your text file is actually in? Most .NET APIs use UTF-8 by default, but your file may be in the operating system's default encoding, which is represented by Encoding.Default. Try this:

using (StreamReader reader = new StreamReader(filename, Encoding.Default))
{
    ...
}

I also strongly recommend that you try to get it working without touching the database. Read in the file then print out the characters in the string in terms of their Unicode encoding:

public static void DumpString(string text)
{
    Console.WriteLine("Text: '{0}'", text);
    foreach (char c in text)
    {
        Console.WriteLine("{0}: U+{1:x4}", c, (int) c);
    }
}

If this gives the right results, then try to insert it into the database. That way, if the database still looks "wrong" afterwards, you know that the problem is with your database access instead of the file access.

Jon Skeet
That seemed to do the trick. But, I'm not sure I understand this whole encoding thing. What if another user with a different default encoding modifies the file and then I try to rerun my app with the new file. Will it no longer work? Should I use Encoding.UTF8 instead?
bsh152s
Using UTF-8 is a much better idea, yes - but you need to make sure you always know what the encoding really is. Can you ensure that the file will *always* be saved as UTF-8?
Jon Skeet
+1  A: 

You need to check three things:

  1. The encoding used when you open the StreamReader
  2. The column type on the database server (nvarchar rather than varchar)
  3. The collation in effect for the column

If any of these are wrong, you'll get the wrong value when you read the data back from the DB.

Joel Coehoorn