tags:

views:

190

answers:

4

I have a binary value being URL Encoded, and then POSTed to an HttpServlet. The following code shows how I first attempted to extract this data. Very simple except that the result is a String, not bytes.

This seemed to work at first, except that an extra byte appeared three bytes from the end. What I eventually figured out was that my data was being treated as Unicode and converted from one Unicode encoding to UTF-8.

So, other that getting the entire post body and parsing it myself, how can I extract my data without treating it as a string after the url encoding is decoded? Have I misunderstood the specs for posted data in general, or is this a Java/Tomcat specific issue?

protected void doPost(HttpServletRequest request, HttpServletResponse response)
        throws ServletException, IOException {

    // Receive/Parse the request
    String requestStr = request.getParameter("request");
    byte[] rawRequestMsg = requestStr.getBytes();

Here is a snippet of the Python test script I'm using for the request:

    urlRequest = urllib.urlencode( {'request': rawRequest} )

    connection = urllib.urlopen(self.url, data = urlRequest)
    result = connection.readlines()
    connection.close()
A: 

Erm, don't use java.

Oh yeah
A: 

you can do this with a servlet wrapper (HttpServletRequestWrapper)... catch the request and snatch the request body before its decoded

but the best way is probably to send the data as a file upload (multipart/form-data content type)

jspcal
+2  A: 

I think this should work (it treats request as a single-byte encoding, so transformation to String is completely reversible):

String someSingleByteEncoding = "ISO-8859-1";
request.setCharacterEncoding(someSingleByteEncoding);
String requestStr = request.getParameter("request"); 
byte[] rawRequestMsg = requestStr.getBytes(someSingleByteEncoding);
axtavt
This is working, but I'm not sure if I should consider it 'right' or not. This is for a web API to be exposed inside the company to a variety of people in a variety of languages.
DonGar
I'm going back to this as the 'right' answer. In large part because it will also allow calls to be done via GET as well as POST. The binary blobs in question are small (Protocol Buffer structs) and flexibility in calls to the server is important.
DonGar
You'll really need to document that the string has to be encoded with the very same charset prior to sending. To prepare world domination, I would recommend using UTF-8 for that, in the both sides.
BalusC
+1  A: 

There are two possible solutions:

  • ASCII-encode your data before POSTing it. Base64 would be a sensible choice. Decode it in your servlet and you have your original binary again.

  • Use form content type multipart/form-data ( http://www.w3.org/TR/html401/interact/forms.html#h-17.13.4 ) to encode your binary data as a stream of bytes; then your servlet can do servletRequest.getReader() to read the data in, again as a binary stream.

Carl Smotricz
I think you're right and that using multipart/form-data is the correct answer.
DonGar