views:

487

answers:

5

Hi,

When a user enters arabic name and password I am supposed to retrieve data for that user.In java I am using utf-8 encoding I am supposed to retrieve data in the form of arabic text from the database.The database I am using is sql server 2005. and the column is set as varchar instead of nvarchar.Since the database is pointing to production. I could not change the column type to nvarchar. Is there any way to convert the ???? text retrieved from database into arabic text without making changes to DB. Can anyone help on this.

Bhoomesh

+1  A: 

I do not know much more about java,but your porblem is more general.

I think if the characters stored using incorrect encoding (the column is ANSI and the input is UTF8), then I do not recommend you to convert it back to UTF8 even if there is a way, but it is better to solve it from the database side.

Ahmed Said
+2  A: 

Not sure but probably you can convert the utf-8 to ANSI and store in the varchar column. While reading convert it back to utf-8.

I don't know if its possible or not but try googling.

Bhushan
Thanks Bhushan, i will try it and get back
You mean convert it using new String(arabicText.getBytes("UTF-8"), "ISO-8859-1"). I don't know the position of arabic code points in the unicode table, but I guess it could triple the text size.
kd304
Icebob
But the driver should be doing this automatically if the database is actually using an Arabic code page. I strongly suspect that it's using a non-Arabic code page. Hacking around "pretending" that you've got text in one encoding when it really isn't is a fundamentally *bad idea*. Just say no.
Jon Skeet
A: 

You can try to use prefix N for inserting strings into db. Like this: N'some text'.

iburlakov
As I understand it, that will make the text an nvarchar, which isn't going to help much when the field type itself is varchar. I could be missing something though...
Jon Skeet
+2  A: 

How were you storing the text in the database in the first place? Fundamentally, it sounds like you should be changing the database. If you're using varchar with a non-Unicode collation, then you're either limited to storing text which fits with that collation or you've got to use horribly unreliable and fundamentally "wrong" conversions which treat text as binary data.

Even if it's possible to do this for some cases, you may well find that there will be other cases which simply fail.

Talk to your database administrators, explaining that the schema is fundamentally wrong - in order to support full Unicode, you require a schema change. You'll have to go through this eventually, so you might as well save yourself nasty hacks in the meantime.

Jon Skeet
A: 

There's also UTF-7, which uses only US-ASCII chars. Using this, you might stay away from code page problems.

TonJ
You can also look at RFC 2152, http://tools.ietf.org/html/rfc2152. It has a nice explanation.
TonJ
UTF-7 isn't supported in the Sun Java 6 implementation, so a 3rd party encoder might be required to do this. http://java.sun.com/javase/6/docs/technotes/guides/intl/encoding.doc.html
McDowell