views:

394

answers:

3

An HTTP POST request is made to my servlet. There is a posted form parameter in the http request that my code in the servlet retrieves for further processing named "payload". When the value of the payload includes the windows-1252 character "’" (ascii value 146), HttpServletRequest instance method getParameter("payload") returns null. There is nothing in the server.log related to the problem. We think the character encoding used to produce this character is windows-1252. The character encoding glassfish defaults to for http requests appears to be ISO-8859-1. Ascii value 146 is a control character in ISO-8859-1.

Does anyone have any suggestions as to how I could solve this problem?

The http request headers in the post that showed the problem are:

POST /dbxchange/TechAnywhere HTTP/1.1
CONTENT_LENGTH: 13117
Content-type: application/x-www-form-urlencoded
Cache-Control: no-cache
Pragma: no-cache
User-Agent: Mozilla/4.0 (Windows Vista 6.0) Java/1.6.0_16
Host: localhost:8080
Accept: text/html, image/gif, image/jpeg, *; q=.2, */*; q=.2
Connection: keep-alive
Content-Length: 13117
A: 

We think the character encoding used to produce this character is windows-1252.

Yes, very probably. Even when browsers claim to be using iso-8559-1, they are usually actually using windows-1252.

The character encoding glassfish defaults to for http requests appears to be ISO-8859-1

Most likely it is defaulting to your system's Java ‘default encoding’. This is rarely what you want, as it makes your application break when you redeploy it.

For reading POST request bodies, you should be able to fix the encoding by calling setCharacterEncoding on the request object, as long as you can do it early enough so that no-one has already caused it to read the body by calling methods such as getParameter. Try setting the encoding to "Cp1252". Although really you ought to be aiming for UTF-8 for everything in the long run.

Unfortunately there is not a standard J2EE way to specify what encoding your application expects for all requests (including query string parameters, which are not affected by setCharacterEncoding). Each server has its own way, which creates annoying deployment issues. But for Glassfish, set a <parameter-encoding> in your sun-web.xml.

bobince
I tried your suggestions but I am getting the same problem even though the character encoding appears to be set. request.getCharacterEncoding() returns "Cp1252" at the start of the processRequest(HttpServletRequest request, HttpServletResponse response) method.
JohnCooperNZ
+1  A: 

Java doesn't care about the differences between Cp1252 and Latin-1. Since there are no invalid byte sequence in both encoding, you wouldn't get null with either one. I think your server is using UTF-8 and the browser is using Cp1252 or Latin1.

Try to put following attributes in form to see if it helps,

<form action="..." method="post" charset="UTF-8" accept-encoding="UTF-8"...>
ZZ Coder
+1 for the suggestion of fixing the HTML, which is better imo than blindly changing the encoding of the request
kdgregory
A: 

We have found that the problem is in the javascript code that sends the post request. The javascript code was URL encoding the value of the payload before sending the request. The javascript built-in function escape() was used to do the URL encoding. This was encoding the character to a non standard encoding implementation of %u2019. It appears as though glassfish does not support this non standard form of encoding.

See http://en.wikipedia.org/wiki/Percent-encoding#Non-standard%5Fimplementations

The fix was to use the built-in javascript function encodeURI() which returns "%E2%80%99" for ’

JohnCooperNZ