views:

215

answers:

3

I've got some problems on a .NET site where both Latin (åäö) and Latvian (āē) characters would need to co-exist. In the database, I'm using nvarchar and ntext for my fields, and if I enter the texts directly into the database, everything works fine.

The problem is when I enter the exact same text using a form that does an SQL UPDATE, the Latvian chars get converted to similar Latin chars, so the string "åäöāē" would be saved as "åäöae".

I've tried a bit with different CodePages, but I'm not yet getting the results I want. Does anyone have any suggestions?

A: 

Windows Forms apps

When entering text into a windows forms textbox you need to use a unicode font such as Arial Unicode MS.

Web page

Make sure you are using UTF-8 as your response codepage. Great article from Microsoft on exactly this http://support.microsoft.com/kb/893663

UTF-8 will correctly encode any characters. Keep in mind, NText and NVarChar in your DB are UTF-16 datatypes, so viewing the data from Query Analyser might will show it correctly.

In your SQL

If you're constructing the SQL dynamically, make sure you use the N prefix e.g.

INSERT INTO TABLE (Name, Number) VALUES (N'MyName', 1)
badbod99
+1  A: 

Work out what's going on by taking each piece of the pipeline separately.

See my article on debugging unicode problems and diagnose where things are going wrong. Places that you may be seeing problems:

  • Getting the data from the user
  • Updating the database
  • Fetching from the database
  • Displaying the data you've fetched

Check each of these independently, never relying on any particular font etc. Print out the Unicode code points involved (as per the article).

Oh, and wherever you get to specify it, use an encoding which can encode everything - I'd suggest UTF-8.

Jon Skeet
UTF-8 can encode everything, but will only display it correctly with the correct codepage.
badbod99
@badbod99: What do you mean by "with the correct code page"? Certainly anything decoding the encoded data needs to know which encoding is involved... and if you're displaying glyphs, the font needs to be correct.
Jon Skeet
You're right :-) codepage isn't relevant for UTF-8. My bad.
badbod99
+2  A: 

Is your SQL statement passing the value in to the db as a unicode value?

e.g. basic example

INSERT YourTable VALUES ('Unicode Value')

should be:

INSERT YourTable VALUES (N'Unicode Value')

Obviously I'd recommend you use parameterised SQL/stored procedure and define the parameter as an NVARCHAR, but you get what I'm saying.

AdaTheDev
Just curious: why recommend stored procedures?
celopes
Main point is using a parameterised statement. I personally like stored procedures for: maintainability, tuneability (without needing to change / redeploy code), security (no need to grant explicit permissions to underlying tables, just EXECUTE permissions on the SPROC)....
AdaTheDev