views:

118

answers:

2

99.9% of the pages in my application are using UTF-8 encoding.

However for some special usecase in the client side, I need one of them to use Unicode (2 bytes for each character)

For that matter the header of this page is:

<%@ page language="java" contentType="text/html; charset=unicode"%>
...<my content>...

This implementation works fine and do the job, when the application is run on Tomcat and Webspher. However when it is deployed on Weblogic, I get the server error: unsupported encoding: 'unicode': java.io.UnsupportedEncodingException: unicode

Does someone know how I can force Weblogic to send pages in 'Unicode' encoding?

+1  A: 

UTF-8 is Unicode. "Unicode" is not a character encoding at its own, it is a character mapping standard (a charset). Your problem lies somewhere else. Maybe you've had problems with GET request encoding. This is often overlooked. You may then find this article useful to get more background information and complete solutions how to get the Unicode phenomenon to work in a Java EE webapplication: Unicode - How to get the characters right?

Good luck.

By the way, the "2 bytes per character" is characteristic for the majority of the UTF-16 encoding (0x0000 until with 0xFFFF are represented in 2 bytes, while UTF-8 uses 1, 2 and 3 bytes for each of the subranges). Maybe you just meant to use it instead?

BalusC
I have already tried to change the charset-unicode to charset=UTF-16 but it didn't help. When Tomcat or Websphere generate a page with charset=unicode, they send the page to the client side with some kind of encoding that I can't figure what it is (but it solve my specific problem), and for some reason Weblogic behaves differently.
Guy Roth
I suggest to read the linked article to **understand** what's going on, so that you can more easy nail down the root cause. You can in any decent browser (e.g. firefox) just check the encoding used througn `view`.
BalusC
OK I cannot mark your answer cause my problem is not actually solved, but you provided me some leadsI appreciate your effort.
Guy Roth
A: 

Unicode is not a charset, but there are charsets allowing to represent characters to be represented in the Unicode system. You know already the UTF-8 charset, which encodes each character with 1, 2, 3 or 4 bytes, depending on the position of the character in the system. It seems that you want to use the UTF-16 charset, which encodes each character with 2 or 4 bytes.

Note related to the answer provided by BalusC: here I use the word "charset" as "denominator for the character set encoding part in the Content-Type MIME header". Strictly speaking, the Universal Character Set provided by Unicode is a character set, but we don't strictly specify a character set with the charset moniker.

Damien B
You probably mean "encoding" or "character encoding" when you say charset. UTF-8 isn't a charset either (as you noted), it uses the Unicode charset. Using "charset" for "encoding" comes from a time when each widely-used encoding was just a simple mapping of the values 0-255 to some characters so each encoding has its own set of characters it supported. Modern encodings (UTF-8, UTF-16, even UTF-7) support all Unicode characters, so calling them a "charset" is not really correct.
Joachim Sauer
Joachim is right, but unfortunately this isn't been used consistently everywhere. For example the `content-type` header expects it being specified as `charset`, not as `encoding` or so.
BalusC
Let's say that Joachim comment was posted before the note was added :-)
Damien B