tags:

views:

530

answers:

10

I'm looking to use Java to parse an ongoing stream of event drive XML generated by a remote device. Here's a simplified sample of two events:

<?xml version="1.0"?>
<Event> DeviceEventMsg
<Param1>SomeParmValue</Param1>
</Event>
<?xml version="1.0"?>
<Event> DeviceEventMsg
<Param1>SomeParmValue</Param1>
</Event>

It seems like SAX is more suited to this than DOM because it is an ongoing stream, though I'm not as familiar with Sax. Don't yell at me for the structure of the XML - I know it already and can't change it.

And yes the device DOES send the xml directive before every event. My first problem is that the second xml processing instruction is croaking the SAX parser.

Can anyone suggest a way to get around that?

A: 

How quickly is the data stream updating? Is the connection lost between xml headers?

simon
The events are home automation device on and off events, so they can be seconds apart, but with long stretches of inactivity. The connection is maintained between events.
Steve Prior
A: 

The code I'm using so far which is croaking on the second xml processing instruction is:

public class TestMe extends HandlerBase {

public void startDocument () throws SAXException { System.out.println("got startDocument"); }

public void endDocument () throws SAXException { System.out.println("got endDocument"); }

public void startElement (String name, AttributeList attrs) throws SAXException { System.out.println("got startElement"); }

public void endElement (String name) throws SAXException { System.out.println("got endElement"); }

public void characters (char buf [], int offset, int len) throws SAXException { System.out.println("found characters"); }

public void processingInstruction (String target, String data) throws SAXException { System.out.println("got processingInstruction"); }

public static void main(String[] args) { SAXParserFactory factory = SAXParserFactory.newInstance(); try { SAXParser saxParser = factory.newSAXParser(); // using a file as test input for now saxParser.parse( new File("devmodule.xml"), new TestMe() );

  } catch (Throwable err) {
        err.printStackTrace ();
  }

} }

Steve Prior
+1  A: 

Try to use StAX instead of SAX. StAX allows much more flexibility and it is a better solution for streaming XML. There are few implementations of StAX, I am very happy with the codehaus one, but there is also one from Sun. It might solve you're problems.

eishay
A: 

Do you know of a way to tell STAX not to barf on the line

<?xml version="1.0"?>

which is in the middle of the input I gave above? Again it's fixed in what the device provides and I cannot change it.

Steve Prior
A: 

If you print out the name for the start and end element System.out.println() you will get something like this:

got startDocument got startElement Event found characters found characters got startElement Param1 found characters got endElement Param1 found characters got endElement Event org.xml.sax.SAXParseException: The processing instruction target matching "[xX][mM][lL]" is not allowed. ...

So I think the second

<?xml version="1.0"?>

without getting an endDocument is causing a parser problem.

simon
A: 

If you add this:

catch(SAXException SaxErr){
  System.out.println("ignore this error");
 }

before the other catch you will catch this particular error. you would then have to reopen the device or for the static file case you may have to keep track of were you are in the file.

Or at the end Event event, close the device/File and then reopen it for the next event.

simon
A: 

RE: Simon's suggestion of catching the SAXException to determine when you've come to the end of one XML document and reached the start of another, I think this would be a problematic approach. If another error occurred(for whatever reason), you wouldn't be able to tell whether the exception had been thrown due to erroneous XML or because you'd reached the end of a document.

The problem is that the parser is for processing an XML document; not a stream of several XML documents. I would suggest writing some code to manually parse the incoming data stream, breaking it into individual streams containing a single XML document; and then pass these streams to the XML parser in serial (so guaranteeing the order of your events).

sgreeve
Are there no XML parsers which will catch a series of XML documents coming in through one continuous input stream?
Steve Prior
XML parsers are designed to parse well-formed XML documents (well, technically, some can probably handle document fragments). What you have is not a well-formed XML document.
ykaganovich
A: 

@sgreeve, Agreed, with my suggestion you would have to some how check for this particular error, or handle any error as the end of the document.

Your suggestion is good, pre-parsing (by looking for a known pattern) into a well formed document or documents before passing to the xml parser.

simon
+1  A: 

@Longhorn213 I didn't run a test with you're xml sample, but the way to go will be to get the XMLEvent from the XMLEventReader using nextEvent(). On the XMLEvent you should getEventType() and check that it does not equal to XMLStreamConstants.PROCESSING_INSTRUCTION or any other event type you wish to ignore.

eishay
+1  A: 

One more suggestion, specifically regarding multiple xml declarations. Yes, this is ILLEGAL xml, so proper parsers will barf on it using default modes. But some parsers have alternate "multi-document" modes. For example, Woodstox has this, so you can check out:

http://www.cowtowncoder.com/blog/archives/2008/04/entry_66.html

Basically, you have to tell parser (via input factory) that input is in form of "multiple xml documents" (ParsingMode.PARSING_MODE_DOCUMENTS).

If so, it will accept multiple xml declarations, each one indicating start of a new document.

StaxMan