tags:

views:

170

answers:

2

Sorry for asking basic questions here. Pardon me.

I have a sequence a string in this in unicode as follows.

String unicode = "\u8BF7\u5728\u6B64\u5904\u8F93\u5165\u4EA7\u54C1\u7F16\u53F7\u6216\u540D\u79F0";

How can I convert this to Chinese text or the UTF-8 text ?

+3  A: 

The String itself will always be in Unicode; I'm not sure what you mean by "convert this to Chinese text" but to convert it to the binary representation using UTF-8 you'd use:

byte[] bytes = unicode.getBytes("UTF-8");

or you can use the Charset - using the Guava library for example, you'd just use:

byte[] bytes = unicode.getBytes(Charsets.UTF_8);

(This gets round the brittleness of specifying a string, and avoids worrying about catching UnsupportedEncodingException.)

Or you can declare:

final static Charset UTF_8 = Charset.forName("UTF-8");

at the top of your class to avoid a whole library as a cure for the string.

Jon Skeet
John,Where did Charsets.UTF_8 come from? It ain't in Java 1.6.
bmargulies
@bmargulies: It's in Guava (see http://guava-libraries.googlecode.com/svn/trunk/javadoc/com/google/common/base/Charsets.html)
Simon Nickerson
"请在此处输入产品编号或名称" this is my expected text on the browser for that Unicode input. How can i achieve this?
thndrkiss
@thndrkiss: You need to make sure the encoding you specify in the HTTP response matches what you're actually sending.
Jon Skeet
Great !! . .thanks . .
thndrkiss
@thndrkiss: if you're using JSP then you usually shouldn't need to worry about the conversion, as long as you specify an encoding that actually supports all the necessary characters (such as UTF-8).
Joachim Sauer
A: 

You said above you are outputting to the browser?...If you're using a servlet or similar there are various ways of doing it, you may need to be a bit more specific in your question, because you can specify unicode/utf-8/utf-16 in the http response header or in the html output, e.g. outputting the following tags in the inside the <head> elements:

<meta http-equiv="Content-Type" content="text/html; charset=utf-8">

James B