ansaurus

Question

how can Weblogic send page with Unicode charset

Answer 1

+1 A:

UTF-8 is Unicode. "Unicode" is not a character encoding at its own, it is a character mapping standard (a charset). Your problem lies somewhere else. Maybe you've had problems with GET request encoding. This is often overlooked. You may then find this article useful to get more background information and complete solutions how to get the Unicode phenomenon to work in a Java EE webapplication: Unicode - How to get the characters right?

Good luck.

By the way, the "2 bytes per character" is characteristic for the majority of the UTF-16 encoding (0x0000 until with 0xFFFF are represented in 2 bytes, while UTF-8 uses 1, 2 and 3 bytes for each of the subranges). Maybe you just meant to use it instead?

BalusC 2010-01-18 12:29:51

I have already tried to change the charset-unicode to charset=UTF-16 but it didn't help. When Tomcat or Websphere generate a page with charset=unicode, they send the page to the client side with some kind of encoding that I can't figure what it is (but it solve my specific problem), and for some reason Weblogic behaves differently.

Guy Roth 2010-01-18 13:07:50

I suggest to read the linked article to **understand** what's going on, so that you can more easy nail down the root cause. You can in any decent browser (e.g. firefox) just check the encoding used througn `view`.

BalusC 2010-01-18 13:22:13

OK I cannot mark your answer cause my problem is not actually solved, but you provided me some leadsI appreciate your effort.

Guy Roth 2010-01-18 13:57:47

Answer 2

A:

Unicode is not a charset, but there are charsets allowing to represent characters to be represented in the Unicode system. You know already the UTF-8 charset, which encodes each character with 1, 2, 3 or 4 bytes, depending on the position of the character in the system. It seems that you want to use the UTF-16 charset, which encodes each character with 2 or 4 bytes.

Note related to the answer provided by BalusC: here I use the word "charset" as "denominator for the character set encoding part in the Content-Type MIME header". Strictly speaking, the Universal Character Set provided by Unicode is a character set, but we don't strictly specify a character set with the charset moniker.

Damien B 2010-01-18 12:32:38

You probably mean "encoding" or "character encoding" when you say charset. UTF-8 isn't a charset either (as you noted), it uses the Unicode charset. Using "charset" for "encoding" comes from a time when each widely-used encoding was just a simple mapping of the values 0-255 to some characters so each encoding has its own set of characters it supported. Modern encodings (UTF-8, UTF-16, even UTF-7) support all Unicode characters, so calling them a "charset" is not really correct.

Joachim Sauer 2010-01-18 13:00:18

Joachim is right, but unfortunately this isn't been used consistently everywhere. For example the `content-type` header expects it being specified as `charset`, not as `encoding` or so.

BalusC 2010-01-18 15:32:44

Let's say that Joachim comment was posted before the note was added :-)

Damien B 2010-01-19 07:13:59

ansaurus

tags:

views:

answers:

how can Weblogic send page with Unicode charset

related questions