views:

267

answers:

1

We use a web service which expects UTF-8. The framework we use on the client is Apache Axis2. We call the web service and the soap body contains strings in UTF-8. The problem is that it seems like the body is "double encoded". I.e we have the character 'å'. The utf-8 representation of 'å' in utf-8 is C3 A5 however we see in our logs that the (double) encoded value sent is C3 83 C2 A5.

Has anyone experienced similiar problems?

+1  A: 

It's not entirely clear how you're calling the web service. Does the method in the web service just take a string? If so, what does your string look like in Java? All strings in Java are UTF-16 encoded - if you're converting the UTF-8 binary representation into a string by taking each byte and turning it into a character, then that's the problem.

If you could show what the method you're calling looks like, and how you're calling it, that would help a lot.

For what it's worth, I've used Axis with non-ASCII strings with no problem in the past. I strongly suspect this is a problem with how you're using it rather than with Axis itself, although I'm willing to be proved wrong :)

EDIT: Based on your comment, it sounds like you've got problems receiving the HTML form data, before you hit the web service. If the user has typed "å" into the form, then that's what you should see when you debug in Eclipse. If you're putting bad data into your web service, it's no wonder you're getting bad data out at the other end. I suggest you run WireShark to see exactly what the browser is sending you, both in terms of the raw bytes and also what content encoding it's specifying. My guess is that your web server is treating it as ISO-8859-1 but it's actually UTF-8.

Once you've got the string correctly from the form, I suspect you'll find there are no problems at all in passing it on to the web service.

Jon Skeet
I'll try to explain how we call the the web service. First of the web service is a third party service. We have generated stubs from the wsdl file.The data is posted from a html form and looks like this "å" if i debug it in Eclipse it is displayed as "Ã¥". We create a "query" object which is defined in the stub. We then create the envelope and the the body and then calls the web service method.We also set the property CHARACTER_SET_ENCODING to utf-8 (but that should be the default, right?)
Johan Hammar
Your debugging says it. The data does not enter your application properly. You probably want to use wireshark to look at how the browser submits the data to your application, as the problem exists there.
Paul de Vrieze
Thank you both, it was indeed the data from the browser that caused the problem. I used Wireshark and thought Axis was the failing part. My webserver now treats the data correctly as UTF-8. (For Tomcat use URIEncoding="UTF-8")
Johan Hammar