views:

189

answers:

1

I'm in a position where I use Java to connect to a TCP port and am streamed XML documents one after another, each delimited with the <?xml start of document tag. An example which demonstrates the format:

<?xml version="1.0"?>
<person>
    <name>Fred Bloggs</name>
</person>
<?xml version="1.0"?>
<person>
    <name>Peter Jones</name>
</person>

I'm using the org.xml.sax.* api. The SAX parsing works perfectly for the first document but throws an exception when it comes across the start of the second document:

Exception in thread "main" org.xml.sax.SAXParseException: The processing instruction 
target matching "[xX][mM][lL]" is not allowed.

The following skeleton class demonstrates the setup I'm using:

import org.xml.sax.InputSource;
import org.xml.sax.XMLReader;
import org.xml.sax.helpers.DefaultHandler;
import org.xml.sax.helpers.XMLReaderFactory;

import java.io.FileReader;

public class XMLTest extends DefaultHandler {

  public XMLTest() {
     super();
  }

  public static void main(String[] args) throws Exception {
    XMLReader xr = XMLReaderFactory.createXMLReader();

    XMLTest handler = new XMLTest();
    xr.setContentHandler(handler);
    xr.setErrorHandler(handler);

    xr.parse(new InputSource(new Socket("127.0.0.1", 4555).getInputStream()));
  }
}

I have no control over the format of the xml (it's a financial data feed), but I need to be able to parse it efficiently, and parse all the documents. I've spent the afternoon/evening trying different things but none have yielded results. Any help would be greatly appreciated.

+3  A: 

You'd like to split the stream on every <?xml version="1.0"?> and parse them all separately. The BufferedReader may be helpful in this. Kickoff example:

reader = new BufferedReader(new InputStreamReader(input, "UTF-8"));
StringBuilder builder = null;
for (String line; (line = reader.readLine()) != null;) {
    if (line.startsWith("<?xml")) {
        if (builder != null) {
            xr.parse(new InputSource(builder.toString()));
        }
        builder = new StringBuilder();
    }
    builder.append(line);
}
BalusC
When doing this when `input` is `InputStream input = new Socket("127.0.0.1", 4500).getInputStream();` I get the following exception: Exception in thread "main" java.io.FileNotFoundException: /Users/admin/IdeaProjects/XMLTest/< (No such file or directory) at java.io.FileInputStream.open(Native Method) at java.io.FileInputStream.<init>(FileInputStream.java:106) at java.io.FileInputStream.<init>(FileInputStream.java:66)It seems xr.parse() doesn't like strings, even when wrapped as an InputSource.
jkt
Do you consider yourself capable to interpret stacktraces? I don't see how `FileNotFoundException` is related to this all. I'd say, your problem lies somewhere else, maybe in the step beyond parsing. The in the exception message given filename `/Users/admin/IdeaProjects/XMLTest/<` does indeed not look valid btw. Reread the stacktrace, backtrace the right location in the code which caused this based on the line numbers in the trace, nail down the root cause and fix it. If you stucks and this problem is indeed not related to this question, ask a new question (e.g. "How to save a XML file?").
BalusC
Hey,I can read stacktraces - I only pasted the first few lines. The stacktrace pointer to my code is `at XMLTest.main(XMLTest.java:42)` and line 42 is: `xr.parse(new InputSource(builder.toString()));` (which is from your example above). I appreciate your assistance with this.
jkt
The solution is to wrap the StringBuilder in a StringReader, ie: `xr.parse(new InputSource(new StringReader(builder.toString())));`Thanks for your assistance!
jkt