views:

154

answers:

1

I have a problem … a very peculiar one could you please guide.

Original message: Kevätsunnuntaisin lentää

The flow of data is HttpConnector -> WSDLConnector -> to the underlying system

The following is the encoding of the first 7 characters

4b 65 76 c3 a4 74 73 75 – In Http Connector – the request XML has UTF-8 encoding

4b 65 76 a3 74 73 75 – in WSDL Connector -

InputSource inputSource = new InputSource(myInputStream);
inputSource.setEncoding("UTF-8");

parser.parse(inputSource);

The original string gets converted to Kev£tsunnuntaisin lent££.Also, there is a loss of a byte.

Could you please guide me where I am going wrong? What must I do to avoid this character conversion?

Thanks for your help!!!

+1  A: 

This is very simple: The data in myInputStream is not encoded as UTF-8, hence the decoding fails.

My guess is that you save the output of the HTML connector as a string and then use that as the input for the WSDL connector. In the string, the data is unicode, not UTF-8. Use String.getBytes('UTF-8') to get an array of bytes with the correct encoding.

As for all encoding issues: Always tell the computer with which encoding it should work instead of hoping that it will guess correctly. Bytes have no encoding and the computer is not telepathic :) And I hope it never will be ...

Aaron Digulla