views:

162

answers:

2

Hi all,

This line of code, which decodes an encoded Chinese word:

URLDecoder.decode("%E4%BB%BB%E4%BD%95%E8%BD%A6%E8%BE%86%E5%BA%94",
    "UTF-8").getBytes().length

When I run it in a JSP page (on Jboss) it prints 5:

<%= URLDecoder.decode("%E4%BB%BB%E4%BD%95%E8%BD%A6%E8%BE%86%E5%BA%94", 
       "UTF-8").getBytes().length %>

Running it in a desktop application prints 15:

public static void main(String[] args) {
    System.out.println(URLDecoder.decode(
        "%E4%BB%BB%E4%BD%95%E8%BD%A6%E8%BE%86E5%BA%94", "UTF-8"
    ).getBytes().length);
}

Why? And I want the jsp to get 15 also, how?

+2  A: 

It seems like JBoss is using a different default encoding, which can not represent all characters in your string. You should probably use getBytes("UTF-8").

Jörn Horstmann
A: 

I don't know why there is a difference (that depends on the particular Java environments you're running), but I can tell you what that difference is:

There are 15 bytes in your string. These bytes represent 5 Unicode characters, of 3 bytes each.

You can tell because the first byte of a 3-byte UTF-8 character always starts with hexidecimal "E".

comingstorm