views:

1649

answers:

6

I am storing Chinese and English text in an SQL Server 2005 database and displaying it on a webpage, but the Chinese is not being displayed correctly. I have been reading about the subject and have done the following:

  • used N before the text in my INSERT statement
  • set the field type to nvarchar
  • set the charset of the page to UTF-8

Chinese characters are being displayed in the page correctly when I insert them directly into the page i.e. don't get them from the database

These are the characters that should be displayed:全澳甲流确诊病例已破100

This is what is displayed when the text is retrieved from the database: å…¨æ¾³ç”²æµç¡®è¯Šç—…ä¾‹å·²ç ´1001

This seems to be something that is related to how strings are handled in c# because the Chinese can get retrieved and displayed correctly in classic asp

Is there anything else I need to do to get the data out of the database, into a string and output correctly on an aspx page?

A: 

This is definitely a problem with the encoding of the strings at some point in your round trip from the database to the c# string, but from the sounds of it you're doing everything correctly.

For our database we store Unicode data in NVARCHAR() columns and then read them out to normal C# strings; no text encoding changes were necessary. What kind of of data objects are you using (i.e DataSets, just a DataReader, LINQtoSQL)?

In our application we read the results of the stored procedure using FetchDataSet, and then do a DataBinder.Eval() to assign the string that is eventually the text of a label.

Will Charczuk
I am using a DataSet and casting the row to a string
Sean
+1  A: 

How are the characters getting into the database? Are you entering them via a stored proc? Make sure the parameters on your stored proc are also nvarchar AND on the parameters on the command object you are calling the proc from.

Update: the consensus on the thread is that the database doesn't have properly encoded NVARCHAR content. Here's my latest theory: the database has the UTF8 bytes. These bytes remain untouched when they are output from from ASP. ASP.NET takes the UTF8 bytes and interprets it as single-byte characters.

Try get the bytes out of the the database, and decode it as UTF8, eg:

SqlCommand command = new SqlCommand("SELECT zhtext FROM TestTable", connection);
byte[] byteArray = (byte[])command.ExecuteScalar();
lblText.Text = Encoding.UTF8.GetString(byteArray);
russau
They are inserted with an INSERT query not a stored proc
Sean
Try turning the INSERT into a parametrized query: SqlCommand command = connection.CreateCommand(); command.CommandText = "INSERT INTO Info (ZhText) VALUES (@ZhText)"; command.Parameters.Add(new SqlParameter("@ZhText", SqlDbType.NVarChar).Value = customerID;
russau
I don't think that this is the issue because the characters are displayed in our older asp software without a problem. The problem is related somehow to how I am handling the retrieved data in .net
Sean
If the characters appear broken in your database - then something is going wrong getting them in there. Are the characters saved in the older software okay in SQL Management Studio?It's worth trying the parameter approach above - at least you can exclude that reasoning if it doesn't work.
russau
another thought: if the characters are entered into the database via another means (i.e from the old software, in Management Studio) - do they display okay?
russau
They are inserted with the old software, so they display as in the second example. The asp and .net code retrieve the data but only the asp insertsI tried inserting directly into the database using SQL Management studio and the characters displayed are different and don't display correctly in the asp.I will try changing the query type
Sean
Let me know if I'm barking up the wrong tree here.. I'm unclear on what does / doesn't work:Text is ALWAYS entered in ASP?Text can be DISPLAYED in ASP - but not ASP.NET?Text ALWAYS looks like garbage in Management studio?
russau
Yes, you are correct. Text is ALWAYS entered in ASP. Text can be DISPLAYED in ASP - but not ASP.NET. Text ALWAYS looks like garbage in Management studio.Thanks for your help on this
Sean
Okay.. then my assumptions about the parameters are probably wrong. If I do this is management studio I see the chinese characters fine:CREATE TABLE zhtesting(zhtext nvarchar(max))INSERT INTO zhtesting (zhtext) VALUES (N'本港首次有');SELECT * FROM zhtesting
russau
I just ran your queries in management studio and the characters are not displayed at all (replaced with squares). Odd. Could this be a database setting or system setting somewhere?
Sean
russau
hi sean - i've updated my answer with another *theory*. not sure if stackoverflow informs you of these sorts of updates?
russau
A: 

Have you installed the "support for eastern languages" in your windows? is it XP? if that's the case, your data might be all well, just the SQL management studio doesn't show it properly. (all true type fonts show OK even without the "support for chinese", but system fonts don't)

Ali Shafai
NOTE: "Chinese characters are being displayed in the page correctly when I insert them directly into the page i.e. don't get them from the database"
russau
+4  A: 

So far the information is:

  1. You are using direct SQL INSERT script to insert into the database.
  2. The data appears broken in database.

The problem might lie in two places:

  1. In your INSERT statement, did you prefix the insert value with N?

    INSERT INTO #tmp VALUES (N'全澳甲流确诊病例已破100')

  2. If you prefix the value with N, does the String object hold the correct data?

    String sql = "INSERT INTO #tmp VALUES (N' " + value + "')"

Here I assume value is a String object.

Does this String object hold the correct Chinese characters?

Try print out its value and see.

Updated:

Let's assume the INSERT query is constructed as below:

String sql = "INSERT INTO #tmp VALUES (N' " + value + "')"

I assume value holds the Chinese character.

Did you assign the Chinese characters into value directly? Like

String value = "全澳甲流确诊病例已破100";

The above code shall work. However, if you have done any intermediate processing, it will cause problem.

I did a localized TC project before; the previous architect had done several encoding conversions which are necessary in ASP; but they will create problem in .NET:

  String value = "全澳甲流确诊病例已破100";
  Encoding tc = Encoding.GetEncoding("BIG5");
  byte[] bytes = tc.GetBytes(value);
  value = Encoding.Unicode.GetString(bytes);

The above conversions are unnecessary. In .NET, simply direct assignment will work:

  String value = "全澳甲流确诊病例已破100";

That is because String constants and the String object itself are Unicode compliant.

The framework library, such as File IO, when reading a file which is not encoded in Unicode, they will convert the foreign encoding to Unicode; in other words, the framework will do this dirty job for you. You do not need to perform manual encoding conversion most of time.

Update: Understood that ASP is used to insert data into an SQL server.

I have written a small piece of ASP to insert some Chinese chars into SQL database and it works.

I have a database named "trans" and I created a table "temp" inside. The ASP page is encoded in UTF-8.

<html>
<head title="Untitled">
<meta http-equiv="content-type" content="text/html";charset="utf-8">
</head>
<body>
<script language="vbscript" runat="server">

If Request.Form("Button1") = "Submit" Then

    SqlQuery = "INSERT INTO trans..temp VALUES (N'" + Request.Form("Text1") + "')"

    Set cn = Server.CreateObject("ADODB.Connection")
    cn.Provider = "sqloledb"
    cn.Properties("Data Source").Value = *********
    cn.Properties("Initial Catalog").Value = "TRANS"
    cn.Properties("User ID").Value = "sa"
    cn.Properties("Password").Value = **********
    cn.Properties("Persist Security Info").Value = False

    cn.Open
    cn.Execute(SqlQuery)
    cn.Close

    Set cn = Nothing

    Response.Write SqlQuery
End If

</script>
<form name="form1" method="post" action="input.asp">
    <input name="Text1" type="text" />
    <input name="Button1" value="Submit" type="submit" />
</form>        
</body>
</html>

The table is defined as belows in my database:

 create table temp (data NVARCHAR(100))

Submit the ASP page several times and my table contains proper Chinese data:

select * from trans..temp

data
----------------
test
测试
全澳甲流确诊病例已破100

Hope this can help.

yinyueyouge
The insertion to the database is done in asp not .Net. The retrieval is done in .Net. Do you know of anything special that has to be done in the asp?
Sean
This answer saved my life :) I didn't know the usage of N, like "values (N'......') ". Thank you very much.
Emre
A: 

The summary for me looks like:

  • characters displayed correctly in ASP
  • characters displayed garbled in SSMS
  • characters displayed garbled in ASP.Net

conclusion: data in the database is not encoded correctly, and you need to migrate the data to unicode to deal with them in C#, just as Ryan sketched.

devio
A: 

If i write "places" in webpage textbox it shows as 怕册. and it stores in sql server database in the same chinese font. i use N'value' to store it in. But later if i want to do some string manipulation then how do i do it. ex:i want 3 characters "pla" from "places" which is stored in database.then i need to display this 3 character converted in chinese into a textfield.

newname