views:

2866

answers:

1

I was really discouraged by java's string encoding. There are many auto conversions in it. and I can't found the regular. Anyone have good idea? for example: In a jsp page, it has such link

http://localhost:8080/helloworld/hello?world=凹ㄉ

And then we need to process it, so we do this:

String a = new String(request.getParameter("world").toString().getBytes("ISO-8859-1"), 
                      "UTF-8");
a = "http://localhost/" + a;

And when I debug it, I found a is right.

And then I pass this to a session object: request.getSession().setAttribute("hello", a);

Later in a jsp page with encoding "Big5", and i try to get the attribute and display, And i found the characters "凹ㄉ" are corrupted.

How can I solve this?

+4  A: 

That is not how you convert between character sets. What you need to be worrying about is this part:

 request.getParameter("world").toString().getBytes("ISO-8859-1")

Once you have it as a string, it is stored internally as 16 bit unicode. Getting it as bytes and then telling java to treat those bytes as if they were UTF-8 is not going to do anything good.

If you found a to be fine, that is just a coincidence. Once you call that getParameter("world").toString() you have your unicode string. The further decoding and encoding will just break certain characters, it just happens to not break yours.

The question is how you get that attribute to display later? You say the jsp page's encoding is not unicode, but rather Big5, so what are you doing to get that string out of the attribute map and put it on that page? That is the likely source of the problem. Given the misunderstanding about how to handle the character conversion in getting the parameter, it would be likely that there are some mistakes on that Big5 page as well.

By the way, do you really need to use Big5? Would UTF-16 work (if not UTF-8)? It could certainly remove some headaches.

Yishai
yeah, I need to use big-5.
MemoryLeak