tags:

views:

536

answers:

3

Hello everyone,

I'm parsing a (not well formed) Apple Plist File with java.

My Code looks like this:

InputStream in = new FileInputStream( "foo" ); 
XMLInputFactory factory = XMLInputFactory.newInstance(); 
XMLEventReader parser = factory.createXMLEventReader( in ); 
while (parser.hasNext()){    
XMLEvent event = parser.nextEvent();    
  //code to navigate the nodes 
}

The parts I"m parsing are looking like this:

<dict>    
  <key>foo</key><integer>123</integer>
  <key>bar</key><string>Boom &amp; Shroom</string>
</dict>

My problem is now, that nodes containing a ampersand are not parsed like they should because the ampersand is representing a entity.

What can i do to get the value of the node as a complete String, instead of broken parts?

Thank you in advance.

A: 

Okay, thanks to your comment i wrote a little Method:

static String getString(XMLEventReader parser, XMLEvent event) throws XMLStreamException{
        String s = new String();
        event = parser.nextEvent();
        Characters c = event.asCharacters();
        s+=c.toString();
        while(parser.nextEvent().isCharacters()){
            c = event.asCharacters();
            s+=c.toString();
            event = parser.nextEvent();
        }

        return s;

}

This is working great for nodes only containing ampersands, but some of the nodes also contain stuff like > so this is a new eventtype :/. So this method wont work.

Jannis
A: 

There is a predefined method getElementText(), which is buggy in jdk1.6.0_15, but works ok with jdk1.6.0_19. A complete program to easily parse the plist file is this:

import java.io.FileInputStream;
import java.io.IOException;
import java.io.InputStream;

import javax.xml.stream.XMLEventReader;
import javax.xml.stream.XMLInputFactory;
import javax.xml.stream.XMLStreamException;
import javax.xml.stream.events.XMLEvent;

public class Parser {

    public static void main(String[] args) throws XMLStreamException, IOException {
        InputStream in = new FileInputStream("foo.xml");
        XMLInputFactory factory = XMLInputFactory.newInstance();
        XMLEventReader parser = factory.createXMLEventReader(in);

        assert parser.nextEvent().isStartDocument();

        XMLEvent event = parser.nextTag();
        assert event.isStartElement();
        final String name1 = event.asStartElement().getName().getLocalPart();

        if (name1.equals("dict")) {
            while ((event = parser.nextTag()).isStartElement()) {
                final String name2 = event.asStartElement().getName().getLocalPart();

                if (name2.equals("key")) {
                    String key = parser.getElementText();
                    System.out.println("key: " + key);

                } else if (name2.equals("integer")) {
                    String number = parser.getElementText();
                    System.out.println("integer: " + number);

                } else if (name2.equals("string")) {
                    String str = parser.getElementText();
                    System.out.println("string: " + str);
                }
            }
        }

        assert parser.nextEvent().isEndDocument();
    }
}
Roland Illig
+1  A: 

You should be able to solve your problem by setting the IS_COALESCING property on the XMLInputFactory (I also prefer XMLStreamReader over XMLEventReader, but ymmv):

XMLInputFactory factory = XMLInputFactory.newInstance();
factory.setProperty(XMLInputFactory.IS_COALESCING, Boolean.TRUE);

InputStream in = // ...
xmlReader = factory.createXMLStreamReader(in, "UTF-8");

Incidentally, to the best of my knowledge none of the JDK parsers will handle "not well formed" XML without choking. Your XML is, in fact, well-formed: it uses an entity rather than a raw ampersand.

Anon