ansaurus

Question

Java: How to split XML stream into small XML documents? XPath on streaming XML parser?

Answer 1

+2 A:

The JAXP SAX api with SAX filter is both fast and efficient. Good intro filters can be seen here

Jimmy 2009-10-28 22:20:38

hmm I'm not quite getting it. I can see how I can catch the event when my tag gets parsed, but it's not clear to me how to get the filter to redirect the stream to a new document until the end tag and how to include the parents + their various name-space bindings. Any chance you can expand on this a bit? I'm aware I can do this just through SAX by basically catching all kinds of event and keeping track of things and plain copying things, but I was hoping there is an easier solution.

Carsten 2009-10-29 05:45:17

It's not the easy solution I hoped for, but it is correct and nobody had a better suggestion, so I'll give it to you ...

Carsten 2009-11-01 23:30:14

Answer 2

A:

I happen to like the XOM XML library, as its interface is simple, intuitive and powerful. To do what you want with XML, you can use your own NodeFactory and (for example) override the finishMakingElement() method. If it is making the element that you want (in your case, <b>) then you pass it along to whatever you need to do with it.

Adam Batkin 2009-10-28 22:25:09

Answer 3

A:

Am I a maverick in suggesting regular expressions...?

Neil Coffey 2009-10-28 23:42:39

not if you supply the expression ....

Carsten 2009-10-29 03:00:20

Well... without seeing the file...

Neil Coffey 2009-10-29 04:45:21

Answer 4

A:

As a XML splitter, VTD-XML is ideally suited for this task... it is also more memory efficient than DOM. The key method that simplify coding is VTDNav's getElementFragment()... below is the Java code for split input.xml into out0.xml and out1.xml

<a> <b> text1 </b>  <b> text2 </b> </a>

into

<a> <b> text1</b> </a>

and

<a> <b> text2</b> </a>

using XPath

/a/b

The code

import java.io.*;
import com.ximpleware.*;

public class split {
    public static void main(String[] argv) throws Exception{
     VTDGen vg = new VTDGen();
     if (vg.parseFile("c:/split/input.xml", true)){
      VTDNav vn = vg.getNav();
      AutoPilot ap = new AutoPilot(vn);
      ap.selectXPath("/a/b");
      int i=-1,k=0;
      byte[] ba = vn.getXML().getBytes();
      while((i=ap.evalXPath())!=-1){
       FileOutputStream fos = new FileOutputStream("c:/split/out"+k+".xml");
       fos.write("<a>".getBytes());
       long l = vn.getElementFragment();
       fos.write(ba, (int)l, (int)(l>>32));
       fos.write("</a>".getBytes());
       k++;
      }
     }  
    }
}

For further reading, please visit http://www.devx.com/xml/Article/36379

vtd-xml-author 2009-10-29 01:28:26

Thanks for your reply.This looks like a DOM style approach to me, reuqiring the whole document to be parsed before doing a query. My XML stream is too big for that, it needs to be done by a streaming parser.

Carsten 2009-10-29 02:54:18

with extended version, it can do partial loading via memory map,but this is only available in extended edition, with standard version, 2GB is the most you can load, it only consumes around 1/5 the memory of DOM...

vtd-xml-author 2009-10-29 03:19:32

Answer 5

A:

go old school

StringBuilder buffer = new StringBuilder(1024 * 50);
BufferedReader reader = new BufferedReader(new FileReader(pstmtout));
String line;
while ((line = reader.readLine()) != null) {
  buffer.append(line);
  if (line.equalsIgnoreCase(endStatementTag)) {
    service.handle(buffer.toString());
    buffer.delete(0, buffer.length());
  }
}

2010-05-20 19:57:20

ansaurus

tags:

views:

answers:

Java: How to split XML stream into small XML documents? XPath on streaming XML parser?

related questions