I currently have a Java SAX parser that is extracting some info from a 30GB xml file. Presently it is
- reading each xml node
- storing it into a string object,
- running some regexex on the string
- storing the results to the database
For several million elements. I'm running this on a computer with 16GB of memory, but the memory is not being fully utilized. Is there a simple way to dynamically 'buffer' about 10gb worth of data from the input file? I suspect I could manually take a 'producer' 'consumer' multithreaded version of this (loading the objects on one side, using them and discarding on the other), but damnit, XML is ancient now, are there no efficient libraries to crunch em?