views:

244

answers:

1

I am trying to allow users to enter Hebrew characters into certain fields in an HTML form (processed using java). I did some research, and it is apparent that the following tag needs to be part of the HTML document:

<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />

That being done, I am having the following result: When the user enters Hebrew text in the input field, it will save and display on the screen properly, in Hebrew. However, the if I view the data in the database, it is unintelligible. Furthermore, if I try to output it to a file (using iText), it is gibberish. However, if I input the data straight into the database, it is readable in Hebrew in the database, as well as in the output file, but it is gibberish on the screen.

Sample: If the user entered it in the browser, it appears like this: עִבְרִית

The same string, when inputted in the database, appears like this on the screen: �Ѱ���

When looking at the database, the browser-inputted string looks like this: עִבְרִית

the manually entered string appears like this: עִבְרִית (although it appears left-to-right, whereas Hebrew is a right-to-left language; when copied and pasted here, it appears correctly, right-to-left)

Obviously, the database and browser are not "talking" the same language with this encoding. I am using SQL Server and did not make any changes to the database, other than ensuring that the field in question is defined as an nvarchar field. What am I missing?

+1  A: 

It sounds like the database encoding is not set correctly. If the database is only expecting 8859-1 (a common default encoding scheme) then it will try to turn the utf-8 into 8859-1. This often doesn't work well.

Here is an article from MS on the issue: http://support.microsoft.com/kb/232580

Larry

Larry K
Thanks for the help and link. I read the article, and it explained the problem, but I still can't figure out how to fix it. How can I either change the HTML to use UTF-16 (UCS-2, which the article claims is used by SQL Server 2000) or get SQL Server to understand UTF-8? I tried to translate the string to UTF-16, but it doesn't seem to work
twpc
Did you try setting your db fields to be type BINARY/VARBINARY/IMAGE as the MS article suggests (option 4)?
Larry K
Yes, I did, it didn't make it readable in the database. However, I found this article, which contained the code necessary to convert the data to/from UTF-8... Interestingly enough, the conversion that worked was between UTF-8 and ISO-8859-1, not between UTF-8 and UTF-16, as was implied in the article from MS above. This is a must-read for anyone having the same issue and using Java:http://www.jguru.com/faq/view.jsp?EID=137049
twpc
Note: read the comments section in the above article. That's where the real help is
twpc