Hi,
I have to parse the content I get from the web and it can contain special characters. In this case the content string appears like the following:
<?xml version="1.0" encoding="UTF-8"?>
<products>
<product>
<id>1</id>
<price>2.14</price>
<title>test ž test</title>
When the contet above is passed to the method characters(), in the class which is extended from org.xml.sax.helpers.DefaultHandler:
public class ProductsXMLHandler extends DefaultHandler {
...
@Override
public void characters(char[] ch, int start, int length)
throws SAXException {
String elementValue = new String(ch, start, length);
...
}
I noticed the array test ž test
is broken into three arrays: 'test ', 'ž
' and ' test'
so the elementValue is not equal test ž test
which should be the result. Does anyone know how to solve the problem?
Is it necessary to recode the source string:
<?xml version="1.0" encoding="UTF-8"?>
<products>
<product>
<id>1</id>
<price>2.14</price>
<title>test ž test</title>
before it is passed to XML handler class?
Thank you!