views:

62

answers:

3

Hi gang,

I've Html page that looks like:

<HTML>
<meta http-equiv='Content-Type' content='text/html; charset=gb2312'>
<BODY onload='document.forms[0].submit();'>
<form name="form" method="post" action="/path/to/some/servlet">
<input type="hidden" name="username" value="麗安"> <!-- UTF-8 characters -->
</form>
</BODY>
</HTML>

As you can see, the content of this page is UTF-8, but I need to send it with GB2312 character encoding, as the servlet that I am sending this page to expects from me GB2312.

Is this a valid scenario? Because in the servlet, I couldn't retive these chines characters back using a filter that sets the character encoding to GB2312!!

I've created a sample Servlet:

package org.daz;

import java.io.IOException;
import javax.servlet.ServletException;
import javax.servlet.http.HttpServlet;
import javax.servlet.http.HttpServletRequest;
import javax.servlet.http.HttpServletResponse;

public class EncodingServlet extends HttpServlet {
    private static final long serialVersionUID = 1L;
    private static final String ENCODING = "GB2312";

    protected void doPost(HttpServletRequest request, HttpServletResponse response) 
        throws ServletException, IOException {

        setCharacterEncoding(request, response);

        String username = request.getParameter("username");
        System.out.println(username);

    }

    private void setCharacterEncoding(HttpServletRequest request, HttpServletResponse response)throws IOException{
        request.setCharacterEncoding(ENCODING);
        response.setCharacterEncoding(ENCODING);
    }

}

The output is: 楹��

Please help

+1  A: 

You can try to do this,

<form name="form" method="post" action="/path/to/some/servlet" charset="gb2312" accept-encoding="gb2312">
<input type="hidden" name="username" value="麗安"> <!-- UTF-8 characters -->
</form>

It might work on some browsers. However, browser is not required to support GB2312 so your mileage may vary.

ZZ Coder
I've tried on both Firefox and Chrome, and seems not working for me!
Mohammed
It also depends on OS. It works for me on IE6 on Chinese Windows XP.
ZZ Coder
+1  A: 

This is not possible. You'll need to use GB2312 characters from the beginning on instead, or to change the entire application to use UTF-8 only. You can't convert from character encoding X to character encoding Y that way. Any character outside the ASCII range would possibly get corrupted.

The form's accept-charset attribute as some suggest is ignored by most webbrowsers. The W3 spec also literally states "User agents may interpret .. ", not "must". And even then, it would only be used to encode the actual user input, not the hidden fields as in your example. They are already encoded in the page's own encoding (in this case GB2312). In other words, those UTF-8 characters are already corrupted at the moment the page is been processed by the browser.

BalusC
Thanks too much, I've commented this conclusion at my blog: http://m-hewedy.blogspot.com/2010/05/beware-your-text-editor-encodes-your.html
Mohammed
You're welcome. My nickame is by the way **BalusC**, not *BlueC* ;)
BalusC
Oh, you may find [this article](http://balusc.blogspot.com/2009/05/unicode-how-to-get-characters-right.html) useful as well to get more insight in the world of characters and bytes.
BalusC
Thanks too much. If you please, I've referenced your blog article in my blog post.Thanks
Mohammed
A: 
 <form accept-charset="gb2312"

http://www.w3.org/TR/REC-html40/interact/forms.html#adef-accept-charset

irreputable
Didn't work too!
Mohammed