views:

310

answers:

5

Parsing an xml file on Java I get the error:

An invalid XML character (Unicode: 0x0) was found in the element content of the document.

The xml comes from a webservice.

The problem is that I get the error only when the webservice is running on localhost (windows+tomcat), but not when the webservice is online (linux+tomcat).

How can I replace the invalid char?? Thanks.

+1  A: 

This is an encoding issue. Either you read it the inputstream as UTF8 and it isn't or the other way around.

You should specify the encoding explicitly when you read the content. E.g. via

new InputStreamReader(getInputStream(), "UTF-8")

Another problem could be the tomcat. Try to add URIEncoding="UTF-8" in your tomcat’s connector settings in the server.xml file. Because:

It turned out that the JSP specification says that if the page encoding of the JSP pages is not explicitely declared, then ISO-8859-1 should be used (!).

Taken from here.

Karussell
A: 

A bit of looking around reveals that 0x0 is a null character, someone else had the same problem with XML and null characters here http://forums.sun.com/thread.jspa?threadID=579849. Not sure how you are parsing the XML but if you get it as a string first there is some discusion on how to replace the null here http://forums.sun.com/thread.jspa?threadID=628189.

Mark Davidson
+2  A: 

Unicode character 0x0 represents NULL meaning that the data you're pulling contains a NULL somewhere (which is not allowed in XML and hence your error).

Make sure that you find out what causes the NULL in the first place.

Also, how are you interacting with the WebService? If you're using Axis, make sure that the WSDL has some encoding specified for data in and out.

The Elite Gentleman
+1 for common sense approach. Blindly fixing such an error without caring where it came from is not a good idea.
Tomalak
A: 

fixed with this code:

String cleanXMLString = null;
Pattern pattern = null;
Matcher matcher = null;
pattern = Pattern.compile("[\\000]*");
matcher = pattern.matcher(dirtyXMLString);
if (matcher.find()) {
   cleanXMLString = matcher.replaceAll("");
}
Giancarlo
A: 

Most probably through encoding and decoding. If you open the file in a simple text application, you should find the first character to be \ufeff, which i believe translates to NULL or 0x0.

medopal
`\uFEFF` is not NUL, it's a BOM: http://en.wikipedia.org/wiki/Byte_order_mark
Alan Moore