I have a huge XML files up to 1-2gb, and obviously I can't parse the whole file at once, I'd have to split it into parts then parse the parts and do whatever with them.
How can I count number of a certain node? So I can keep track on how many parts do I need to split the file. Is there a maybe better way to do this? I'm open to all suggestions thank you
Question update:
Well I did use STAX, maybe the logic I'm using it for is wrong, I'm parsing the file, then for each node I'm getting the node value and store it inside string builder. Then in another method I go trough stringbuilder and edit the output. Then I write that output to the file. I can do no more than 10000 objects like this.
Here is the exception I get :
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
at com.sun.org.apache.xerces.internal.util.NamespaceSupport.<init>(Unkno
wn Source)
at com.sun.xml.internal.stream.events.XMLEventAllocatorImpl.setNamespace
Context(Unknown Source)
at com.sun.xml.internal.stream.events.XMLEventAllocatorImpl.getXMLEvent(
Unknown Source)
at com.sun.xml.internal.stream.events.XMLEventAllocatorImpl.allocate(Unk
nown Source)
at com.sun.xml.internal.stream.XMLEventReaderImpl.nextEvent(Unknown Sour
ce)
at com.sun.org.apache.xalan.internal.xsltc.trax.StAXEvent2SAX.bridge(Unk
nown Source)
at com.sun.org.apache.xalan.internal.xsltc.trax.StAXEvent2SAX.parse(Unkn
own Source)
at com.sun.org.apache.xalan.internal.xsltc.trax.TransformerImpl.transfor
mIdentity(Unknown Source)
at com.sun.org.apache.xalan.internal.xsltc.trax.TransformerImpl.transfor
m(Unknown Source)
at com.sun.org.apache.xalan.internal.xsltc.trax.TransformerImpl.transfor
m(Unknown Source)
Actually I think my whole approach is wrong, what I'm actually trying convert xml files into CSV samples. Here is how I do it so far :
- Read/parse xml file
- For each element node get text node value
- Open stream write it to file(temp), for n nodes then flush and close stream
- Then open another stream read from temp, use commons strip utils and some other stuff to create proper csv output then write it to csv file